Introduction

Purpose

The purpose of this notebook is to make a portfolio project to showcase some basic data science skills.

To do this a tutorial from Dataquest will be used.

This notebook will be a rehash of the Dataquest tutorial project. Then, in another, separate, notebook, a different project with the same format will be made.

Topic

The SAT scores of high-school students from New York City will be analysed, along with several demographic metrics.

The SAT is a test used in the United States to assess students' readiness for higher education. It is required to be allowed entry to many universities, so students need to do well in it.

The range of SAT scores has changed several times. For the datasets used in this notebook, the maximum score es 2400. The average SAT score of high schools is often used to rank them.

There have been claims of race or gender bias in the SAT, so this analysis will try to look into these claims.

Basic datasets

NYC Open Data publishes data about New York City across different categories. In this case, the interest is in education data.

The basic datasets to be used are:

Additional datasets

To enhance the data from the basic datasets, several others will be added:

Background information

Before beginning the analysis, it's useful to have some background information to provide context for the data:

Data ingestion

The first part of the process is ingesting the data, or reading it in from the different sources (tabular files in this case).

To do this, the following code will:

  • Loop through the files
  • Read each file into a Pandas DataFrame
  • Put each DataFrame into a Python dictionary
In [1]:
# Import pandas, numpy, and IPython's Markdown, pyplot and seaborn
import pandas as pd
import numpy as np
from IPython.display import Markdown
import matplotlib.pyplot as plt
import seaborn as sns
# Make a list of the files to be imported
files={
    # AP (College Board) Results
    'ap_2010':'https://data.cityofnewyork.us/api/views/itfs-ms3e/rows.csv',           
    # Class Size
    'class_size':'https://data.cityofnewyork.us/api/views/urz7-pzb3/rows.csv',
    # School Demographics and Accountability
    'demographics':'https://data.cityofnewyork.us/api/views/ihfw-zy9j/rows.csv', 
    # Graduation Outcomes
    'graduation':'https://data.cityofnewyork.us/api/views/vh2h-md7a/rows.csv',
    # High School Directory
    'hs_directory':'https://data.cityofnewyork.us/api/views/n3p6-zve2/rows.csv',
    # NYS Math Test Results By Grade
    'math_test_results':'https://data.cityofnewyork.us/api/views/jufi-gzgp/rows.csv',
    # SAT results
    'sat_results':'https://data.cityofnewyork.us/api/views/f9bf-2cp4/rows.csv',       
}
# Create an empty dictionary to hold each DataFrame
data={}
# Loop throught files
for k,v in files.items():
  # Read each file into a DataFrame
  d=pd.read_csv(v)
  data[k]=d

Once the data has been read, the .head() method can be used on the data dictionary to print the first five rows of each DataFrame.

In [2]:
# Loop through the keys and values of the dictionary
for k,v in data.items():
  # Print the name of each DataFrame, followed by its first five rows
  display(Markdown('##{}'.format(k)))
  display(v.head())

ap_2010

DBN SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5
0 01M448 UNIVERSITY NEIGHBORHOOD H.S. 39.0 49.0 10.0
1 01M450 EAST SIDE COMMUNITY HS 19.0 21.0 NaN
2 01M515 LOWER EASTSIDE PREP 24.0 26.0 24.0
3 01M539 NEW EXPLORATIONS SCI,TECH,MATH 255.0 377.0 191.0
4 02M296 High School of Hospitality Management NaN NaN NaN

class_size

CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN
2 1 M M015 P.S. 015 Roberto Clemente 01 GEN ED - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
3 1 M M015 P.S. 015 Roberto Clemente 01 CTT - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
4 1 M M015 P.S. 015 Roberto Clemente 02 GEN ED - - - 15.0 1.0 15.0 15.0 15.0 ATS NaN

demographics

DBN Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per
0 01M015 P.S. 015 ROBERTO CLEMENTE 20052006 89.4 NaN 281 15 36 40 33 38 52 29 38 NaN NaN NaN NaN NaN NaN 36.0 12.8 57.0 20.3 25 9 10 3.6 74 26.3 189 67.3 5 1.8 158.0 56.2 123.0 43.8
1 01M015 P.S. 015 ROBERTO CLEMENTE 20062007 89.4 NaN 243 15 29 39 38 34 42 46 NaN NaN NaN NaN NaN NaN NaN 38.0 15.6 55.0 22.6 19 15 18 7.4 68 28.0 153 63.0 4 1.6 140.0 57.6 103.0 42.4
2 01M015 P.S. 015 ROBERTO CLEMENTE 20072008 89.4 NaN 261 18 43 39 36 38 47 40 NaN NaN NaN NaN NaN NaN NaN 52.0 19.9 60.0 23.0 20 14 16 6.1 77 29.5 157 60.2 7 2.7 143.0 54.8 118.0 45.2
3 01M015 P.S. 015 ROBERTO CLEMENTE 20082009 89.4 NaN 252 17 37 44 32 34 39 49 NaN NaN NaN NaN NaN NaN NaN 48.0 19.0 62.0 24.6 21 17 16 6.3 75 29.8 149 59.1 7 2.8 149.0 59.1 103.0 40.9
4 01M015 P.S. 015 ROBERTO CLEMENTE 20092010 96.5 208 16 40 28 32 30 24 38 NaN NaN NaN NaN NaN NaN NaN 40.0 19.2 46.0 22.1 14 14 16 7.7 67 32.2 118 56.7 6 2.9 124.0 59.6 84.0 40.4

graduation

Demographic DBN School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort
0 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2003 5 s NaN s NaN NaN s NaN NaN s NaN NaN s NaN NaN s NaN s NaN
1 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2004 55 37 67.3 17 30.9 45.9 0 0.0 0.0 17 30.9 45.9 20 36.4 54.1 15 27.3 3 5.5
2 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2005 64 43 67.2 27 42.2 62.8 0 0.0 0.0 27 42.2 62.8 16 25.0 37.2 9 14.1 9 14.1
3 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78 43 55.1 36 46.2 83.7 0 0.0 0.0 36 46.2 83.7 7 9.0 16.3 16 20.5 11 14.1
4 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 Aug 78 44 56.4 37 47.4 84.1 0 0.0 0.0 37 47.4 84.1 7 9.0 15.9 15 19.2 11 14.1

hs_directory

dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_min expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 priority08 priority09 priority10 Location 1 Community Board Council District Census Tract BIN BBL NTA
0 27Q260 Frederick Douglass Academy VI High School Queens Q465 718-471-2154 718-471-2890 9.0 12 NaN NaN Q113, Q22 A to Beach 25th St-Wavecrest 8-21 Bay 25 Street Far Rockaway NY 11691 http://schools.nyc.gov/schoolportals/27/Q260 412.0 Far Rockaway Educational Campus NaN Frederick Douglass Academy (FDA) VI High Schoo... Advisory, Graphic Arts Design, Teaching Intern... Spanish Calculus AB, English Language and Composition,... Biology, Physics B French, Spanish After-school Program, Book, Writing, Homework ... Basketball, Cross Country, Indoor Track, Outdo... Basketball, Cross Country, Indoor Track, Outdo... NaN Step Team, Modern Dance, Hip Hop Dance NaN Jamaica Hospital Medical Center, Peninsula Hos... York College, Brooklyn College, St. John's Col... NaN Queens District Attorney, Sports and Arts Foun... Replications, Inc. Citibank New York Road Runners Foundation (NYRRF) Uniform Required: plain white collared shirt, ... Extended Day Program, Student Summer Orientati... 7:45 AM 2:05 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to Queens students or residents who a... Then to New York City residents who attend an ... Then to Queens students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 8 21 Bay 25 Street\nFar Rockaway, NY 11691\n(4... 14.0 31.0 100802.0 4300730.0 4.157360e+09 Far Rockaway-Bayswater ...
1 21K559 Life Academy High School for Film and Music Brooklyn K400 718-333-7750 718-333-7775 9.0 12 NaN NaN B1, B3, B4, B6, B64, B82 D to 25th Ave ; N to Ave U ; N to Gravesend - ... 2630 Benson Avenue Brooklyn NY 11214 http://schools.nyc.gov/schoolportals/21/K559 260.0 Lafayette Educational Campus NaN At Life Academy High School for Film and Music... College Now, iLEARN courses, Art and Film Prod... Spanish NaN Biology, English Literature and Composition, E... NaN Film, Music, Talent Show, Holiday Concert, Stu... Basketball, Bowling, Indoor Track, Soccer, Sof... Basketball, Bowling, Indoor Track, Soccer, Sof... Cricket NaN Coney Island Generation Gap NaN City Tech, Kingsborough Early College Secondar... Museum of the Moving Image, New York Public Li... Institute for Student Achievement Film Life, Inc., SONY Wonder Tech NaN NaN Our school requires completion of a Common Cor... NaN 8:15 AM 3:00 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN NaN NaN NaN 2630 Benson Avenue\nBrooklyn, NY 11214\n(40.59... 13.0 47.0 306.0 3186454.0 3.068830e+09 Gravesend ...
2 16K393 Frederick Douglass Academy IV Secondary School Brooklyn K026 718-574-2820 718-574-2821 9.0 12 NaN NaN B15, B38, B46, B47, B52, B54, Q24 J to Kosciusko St ; M, Z to Myrtle Ave 1014 Lafayette Avenue Brooklyn NY 11221 http://schools.nyc.gov/schoolportals/16/K393 155.0 NaN NaN The Frederick Douglass Academy IV (FDA IV) Sec... College Now with Medgar Evers College, Fresh P... French, Spanish English Language and Composition, United State... French Language and Culture NaN After-school and Saturday Programs, Art Studio... NaN NaN NaN Basketball Team Achieving Change in our Neighborhood (Teen ACT... NaN Medgar Evers College Noel Pointer School of Music Hip-Hop 4 Life, Urban Arts, and St. Nicks Alli... NaN NaN NaN Dress Code Required: solid white shirt/blouse,... Student Summer Orientation, Weekend Program of... 8:00 AM 2:20 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to Brooklyn students or residents who att... Then to New York City residents who attend an ... Then to Brooklyn students or residents Then to New York City residents NaN NaN NaN NaN NaN 1014 Lafayette Avenue\nBrooklyn, NY 11221\n(40... 3.0 36.0 291.0 3393805.0 3.016160e+09 Stuyvesant Heights ...
3 08X305 Pablo Neruda Academy Bronx X450 718-824-1682 718-824-1663 9.0 12 NaN NaN Bx22, Bx27, Bx36, Bx39, Bx5 NaN 1980 Lafayette Avenue Bronx NY 10473 www.pablonerudaacademy.org 335.0 Adlai E. Stevenson Educational Campus NaN Our mission is to engage, inspire, and educate... Advanced Placement courses, Electives courses ... Spanish Art History, English Language and Composition,... NaN Spanish Youth Court, Student Government, Youth Service... Basketball, Outdoor Track, Softball, Tennis, V... Basketball, Outdoor Track, Softball, Tennis, V... NaN Baseball, Basketball, Flag Football, Soccer, S... NaN Soundview Health Center, Bronx Lebanon Hospita... Hostos Community College, Monroe College, Lehm... Chilean Consulate, Materials for the Arts Network for Teaching Entrepreneurship (NFTE), ... NaN NaN iLearnNYC All students are individually programmed (base... Extended Day Program 8:00 AM 3:50 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to Bronx students or residents who at... Then to New York City residents who attend an ... Then to Bronx students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 1980 Lafayette Avenue\nBronx, NY 10473\n(40.82... 9.0 18.0 16.0 2022205.0 2.036040e+09 Soundview-Castle Hill-Clason Point-Harding Par...
4 03M485 Fiorello H. LaGuardia High School of Music & A... Manhattan M485 212-496-0700 212-724-5748 9.0 12 NaN NaN M10, M104, M11, M20, M31, M5, M57, M66, M7, M72 1 to 66th St - Lincoln Center ; 2, 3 to 72nd S... 100 Amsterdam Avenue New York NY 10023 www.laguardiahs.org 2730.0 NaN Specialized School We enjoy an international reputation as the fi... Students have a daily program that includes bo... French, Italian, Japanese, Spanish Art History, Biology, Calculus AB, Calculus BC... NaN Spanish Amnesty International, Anime, Annual Musical, ... Basketball, Bowling, Cross Country, Fencing, G... Basketball, Bowling, Cross Country, Fencing, G... NaN NaN Lincoln Center for the Performing Arts Mount Sinai Medical Center The Cooper Union for the Advancement of Scienc... Lincoln Center for the Performing Arts, Americ... Junior Achievement, Red Cross, United Nations ... Sony Music, Warner Music Group, Capital Cities... NaN NaN Chancellor’s Arts Endorsed Diploma NaN 8:00 AM 4:00 PM This school will provide students with disabil... ESL Functionally Accessible 6 Open to New York City residents Admission is based on the outcome of a competi... Students must audition for each program (studi... Students must be residents of New York City at... NaN NaN NaN NaN NaN NaN 100 Amsterdam Avenue\nNew York, NY 10023\n(40.... 7.0 6.0 151.0 1030341.0 1.011560e+09 Lincoln Square ...

math_test_results

DBN Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 %
0 01M015 3 2006 All Students 39 667.0 2.0 5.1 11.0 28.2 20.0 51.3 6.0 15.4 26.0 66.7
1 01M015 3 2007 All Students 31 672.0 2.0 6.5 3.0 9.7 22.0 71.0 4.0 12.9 26.0 83.9
2 01M015 3 2008 All Students 37 668.0 0.0 0.0 6.0 16.2 29.0 78.4 2.0 5.4 31.0 83.8
3 01M015 3 2009 All Students 33 668.0 0.0 0.0 4.0 12.1 28.0 84.8 1.0 3.0 29.0 87.9
4 01M015 3 2010 All Students 26 677.0 6.0 23.1 12.0 46.2 6.0 23.1 2.0 7.7 8.0 30.8

sat_results

DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355 404 363
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383 423 366
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377 402 370
3 01M458 FORSYTH SATELLITE ACADEMY 7 414 401 359
4 01M509 MARTA VALLE HIGH SCHOOL 44 390 433 384

Some useful details can be noted from the previous output:

  • Most of the datasets contain a DBN column, which can be used to uniquely identify each school
  • There is information that could be used to make a map (Location 1 column)
  • In some datasets there are multiple rows for each school, so preprocessing will be necessary

Combining the datasets

To make working with the data easier, all the individual datasets will be combined into a single one. To do this, a column that exists in every dataset must be found. From what it was noted above, DBN might be that column.

Two of the datasets, hs_directory and class_size, don't contain a DBN column.

hs_directory dataset

In this case, the column exists, but it is called dbn, so it is only necessary to change its name to DBN.

In [3]:
# Change dbn column to DBN
data['hs_directory']['DBN']=data['hs_directory']['dbn']

class_size dataset

In the rest of the datasets, the DBN column looks like the following output.

In [4]:
display(data['demographics']['DBN'].head())
0    01M015
1    01M015
2    01M015
3    01M015
4    01M015
Name: DBN, dtype: object

But, looking at the five first class_size rows, no DBN column can be seen.

In [5]:
display(data['class_size'].head())
CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN
2 1 M M015 P.S. 015 Roberto Clemente 01 GEN ED - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
3 1 M M015 P.S. 015 Roberto Clemente 01 CTT - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
4 1 M M015 P.S. 015 Roberto Clemente 02 GEN ED - - - 15.0 1.0 15.0 15.0 15.0 ATS NaN

From the previous output, it can be observed that the DBN is formed with the CSD, BOROUGH, and SCHOOL CODE columns combined.

Looking at the dictionary of the dataset, the following description can be obtained:

  • DBN stands for District, Borough, School Number, and every school in the system can be identified by this 6-digit code

  • The first two numbers represent the school district (CSD, Community School District)

  • The third character signifies the borough (BOROUGH) in which the school is located (M = Manhattan, X = Bronx, R = Staten Island, K = Brooklyn and Q = Queens)

  • The final three digits represent the SCHOOL CODE and are unique within the borough

So the DBN column can be constructed for the class_size dataset.

In [6]:
# Combine CSD, BOROUGH and SCHOOL CODE columns to form DBN
# 02d formats an integer (d) to a field of minimum width 2 (2), with zero-padding on the left (0)
data['class_size']['DBN']=data['class_size'].apply(lambda x:'{0:02d}{1}'.format(x['CSD'],x['SCHOOL CODE']),axis=1)
# Print the first five rows of class_size
display(data['class_size'].head())
CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO DBN
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN 01M015
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN 01M015
2 1 M M015 P.S. 015 Roberto Clemente 01 GEN ED - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN 01M015
3 1 M M015 P.S. 015 Roberto Clemente 01 CTT - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN 01M015
4 1 M M015 P.S. 015 Roberto Clemente 02 GEN ED - - - 15.0 1.0 15.0 15.0 15.0 ATS NaN 01M015

School surveys

The NYC School Survey is taken every year by parents, teachers and students. It is meant to provide insight into the learning environment of schools, giving access to metrics that go beyond simple test scores.

The questions target the community opinions on academic expectations, communication, engagement, and safety and respect.

In 2011, 960,191 surveys were submitted.

The following code will:

  • Read the surveys for all schools and the surveys for district 75
  • Add a flag that indicates from which school district is each dataset
  • Combine the datasets
In [7]:
from zipfile import ZipFile
import urllib.request
import shutil

# Download the file from "url" and save it to "file_name"
url = 'https://data.cityofnewyork.us/api/views/mnz3-dyi8/files/220be57a-7c05-48dc-94de-b11a882ca9da?download=true&filename=2011%20School%20Survey.zip'
file_name='survey_file.zip'
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
# Extract all files in the zip
zip_file = ZipFile(file_name).extractall()
# Read the surveys for all schools and the surveys for district 75
# Use tab as delimiter and windows-1252 as file encoding
survey_all=pd.read_csv('2011 data files online/masterfile11_gened_final.txt',delimiter='\t',encoding='windows-1252')
survey_d75=pd.read_csv('2011 data files online/masterfile11_d75_final.txt',delimiter='\t',encoding='windows-1252')
# Add a flag, True if the school district is 75, False otherwise
survey_all['d75']=False
survey_d75['d75']=True
# Concatenate the datasets into a single DataFrame
survey=pd.concat([survey_all,survey_d75],axis=0)
# Print the first five rows of survey
display(survey.head())
dbn bn schoolname d75 studentssurveyed highschool schooltype rr_s rr_t rr_p N_s N_t N_p nr_s nr_t nr_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 p_q2h p_q7a p_q7b p_q7c p_q7d p_q8a p_q8b p_q8c ... s_q10_4 s_q11a_1 s_q11a_2 s_q11a_3 s_q11a_4 s_q11b_1 s_q11b_2 s_q11b_3 s_q11b_4 s_q11c_1 s_q11c_2 s_q11c_3 s_q11c_4 s_q12d_1 s_q12d_2 s_q12d_3 s_q12d_4 s_q12e_1 s_q12e_2 s_q12e_3 s_q12e_4 s_q12f_1 s_q12f_2 s_q12f_3 s_q12f_4 s_q12g_1 s_q12g_2 s_q12g_3 s_q12g_4 s_q14_1 s_q14_2 s_q14_3 s_q14_4 s_q14_5 s_q14_6 s_q14_7 s_q14_8 s_q14_9 s_q14_10 s_q14_11
0 01M015 M015 P.S. 015 Roberto Clemente False No 0.0 Elementary School NaN 88 60 NaN 22.0 90.0 0 25 150 8.5 7.6 7.5 7.8 7.5 7.8 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9 8.0 8.2 8.3 7.5 7.9 6.8 8.7 9.7 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 01M019 M019 P.S. 019 Asher Levy False No 0.0 Elementary School NaN 100 60 NaN 34.0 161.0 0 33 269 8.4 7.6 7.6 7.8 8.6 8.5 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4 7.7 7.9 8.0 7.3 7.7 6.5 8.8 9.4 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 01M020 M020 P.S. 020 Anna Silver False No 0.0 Elementary School NaN 88 73 NaN 42.0 367.0 0 48 505 8.9 8.3 8.3 8.6 7.6 6.3 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0 8.1 8.8 8.9 8.5 8.4 7.6 9.2 9.4 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 01M034 M034 P.S. 034 Franklin D. Roosevelt False Yes 0.0 Elementary / Middle School 89.0 73 50 145.0 29.0 151.0 163 40 301 8.8 8.2 8.0 8.5 7.0 6.2 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9 8.1 8.5 8.8 8.2 8.3 7.3 9.2 9.4 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 01M063 M063 P.S. 063 William McKinley False No 0.0 Elementary School NaN 100 60 NaN 23.0 90.0 0 23 151 8.7 7.9 8.1 7.9 8.4 7.3 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0 8.0 8.4 8.6 8.0 8.0 6.5 8.8 9.6 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 2773 columns

Looking at the columns

The survey DataFrame has 2773 columns. To be able to easily compare columns and calculate correlations, this number should be reduced.

The survey data came with a data dictionary (in spreadsheet format) that explains the meaning of each column.

In [8]:
display(pd.read_excel('2011 data files online/Survey Data Dictionary.xls'))
2011 NYC School Survey\nData Dictionary Unnamed: 1
0 This data dictionary can be used with the scho... NaN
1 NaN NaN
2 Field Name Field Description
3 dbn School identification code (district borough n...
4 sch_type School type (Elementary, Middle, High, etc)
5 location School name
6 enrollment Enrollment size
7 borough Borough
8 principal Principal name
9 studentsurvey Only students in grades 6-12 partipate in the ...
10 rr_s Student Response Rate
11 rr_t Teacher Response Rate
12 rr_p Parent Response Rate
13 N_s Number of student respondents
14 N_t Number of teacher respondents
15 N_p Number of parent respondents
16 nr_s Number of eligible students
17 nr_t Number of eligible teachers
18 nr_p Number of eligible parents
19 saf_p_10 Safety and Respect score based on parent respo...
20 com_p_10 Communication score based on parent responses
21 eng_p_10 Engagement score based on parent responses
22 aca_p_10 Academic expectations score based on parent re...
23 saf_t_10 Safety and Respect score based on teacher resp...
24 com_t_10 Communication score based on teacher responses
25 eng_t_10 Engagement score based on teacher responses
26 aca_t_10 Academic expectations score based on teacher r...
27 saf_s_10 Safety and Respect score based on student resp...
28 com_s_10 Communication score based on student responses
29 eng_s_10 Engagement score based on student responses
30 aca_s_10 Academic expectations score based on student r...
31 saf_tot_10 Safety and Respect total score
32 com_tot_10 Communication total score
33 eng_tot_10 Engagement total score
34 aca_tot_10 Academic Expectations total score
35 Field Series Field Series Description
36 Column AG through Column CA These fields provide scores determined for eac...
37 Column CB through Column LV These fields provide percentages of responses ...
38 Column LW through Column VQ These fields provide counts of responses from ...
39 Column VR through Column YS These fields provide scores determined for eac...
40 Column YT through Column AMY These fields provide percentages of responses ...
41 Column AMZ through Column BBE These fields provide counts of responses from ...
42 Column BBF through Column BDD These fields provide scores determined for eac...
43 Column BDE through Column BNB These fields provide percentages of responses ...
44 Column BNC through BWZ These fields provide counts of responses from ...
45 Field Convention Field Convention Description
46 p_q1 Indicates parent_question 1
47 p_q1a Indicates parent_question 1a
48 p_q1a_1 Indicates parent question_1a_response option 1
49 p_N_q1_1 Indicates parent_Number of responses_question ...
50 t_q1 Indicates teacher_question 1
51 t_q1a Indicates teacher_question 1a
52 t_q1a_1 Indicates teacher question_1a_response option 1
53 t_N_q1_1 Indicates teacher_Number of responses_question...
54 s_q1 Indicates student_question 1
55 s_q1a Indicates student_question 1a
56 s_q1a_1 Indicates student question_1a_response option 1
57 s_N_q1_1 Indicates student_Number of responses_question...

Not all columns will provide useful information for the analysis, so only the important ones will be kept:

  • DBN: District, Borough, School Number
  • rr_s: Student Response Rate
  • rr_t: Teacher Response Rate
  • rr_p: Parent Response Rate
  • N_s: Number of student respondents
  • N_t: Number of teacher respondents
  • N_p: Number of parent respondents
  • saf_p_11: Safety and Respect score based on parent responses
  • com_p_11: Communication score based on parent responses
  • eng_p_11: Engagement score based on parent responses
  • aca_p_11: Academic expectations score based on parent responses
  • saf_t_11: Safety and Respect score based on teacher responses
  • com_t_11: Communication score based on teacher responses
  • eng_t_11: Engagement score based on teacher responses
  • aca_t_11: Academic expectations score based on teacher responses
  • saf_s_11: Safety and Respect score based on student responses
  • com_s_11: Communication score based on student responses
  • eng_s_11: Engagement score based on student responses
  • aca_s_11: Academic expectations score based on student responses
  • saf_tot_11: Safety and Respect total score
  • com_tot_11: Communication total score
  • eng_tot_11: Engagement total score
  • aca_tot_11: Academic Expectations total score
In [9]:
# Print the shape of the survey DataFrame
display(Markdown('The `survey` DataFrame has a shape of {} before removing unwanted columns'.format(survey.shape)))
# Change dbn column to DBN
survey['DBN']=survey['dbn']
# Make a list of the columns to be kept
survey_fields=[
               'DBN',
               'rr_s',
               'rr_t',
               'rr_p',
               'N_s',
               'N_t',
               'N_p',
               'saf_p_11',
               'com_p_11',
               'eng_p_11',
               'aca_p_11',
               'saf_t_11',
               'com_t_11',
               'eng_t_11',
               'aca_t_11',
               'saf_s_11',
               'com_s_11',
               'eng_s_11',
               'aca_s_11',
               'saf_tot_11',
               'com_tot_11',
               'eng_tot_11',
               'aca_tot_11',
               ]
# Keep only the columns from the survey_fields list
survey = survey.loc[:,survey_fields]
# Add the survey DataFrame to the data dictionary
data['survey'] = survey
# Print the first five rows of survey and its shape
display(survey.head())
display(Markdown('The `survey` DataFrame has a shape of {} after removing unwanted columns'.format(survey.shape)))

The survey DataFrame has a shape of (1702, 2773) before removing unwanted columns

DBN rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 01M015 NaN 88 60 NaN 22.0 90.0 8.5 7.6 7.5 7.8 7.5 7.8 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9
1 01M019 NaN 100 60 NaN 34.0 161.0 8.4 7.6 7.6 7.8 8.6 8.5 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4
2 01M020 NaN 88 73 NaN 42.0 367.0 8.9 8.3 8.3 8.6 7.6 6.3 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0
3 01M034 89.0 73 50 145.0 29.0 151.0 8.8 8.2 8.0 8.5 7.0 6.2 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9
4 01M063 NaN 100 60 NaN 23.0 90.0 8.7 7.9 8.1 7.9 8.4 7.3 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0

The survey DataFrame has a shape of (1702, 23) after removing unwanted columns

Understanding what each dataset contains, and which columns in each dataset are the relevant ones is crucial to avoid wasting time and effort later on, during the analysis phase.

Condensing datasets

If we again look at the first rows of each dataset, we'll see that in many cases there are several rows for each high school. But in the sat_results dataset, there is only one row per high school.

In [10]:
# Loop through the keys and values of the dictionary
for k,v in data.items():
  # Print the name of each DataFrame, followed by its first five rows
  display(Markdown('##{}'.format(k)))
  display(v.head())

ap_2010

DBN SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5
0 01M448 UNIVERSITY NEIGHBORHOOD H.S. 39.0 49.0 10.0
1 01M450 EAST SIDE COMMUNITY HS 19.0 21.0 NaN
2 01M515 LOWER EASTSIDE PREP 24.0 26.0 24.0
3 01M539 NEW EXPLORATIONS SCI,TECH,MATH 255.0 377.0 191.0
4 02M296 High School of Hospitality Management NaN NaN NaN

class_size

CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO DBN
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN 01M015
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN 01M015
2 1 M M015 P.S. 015 Roberto Clemente 01 GEN ED - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN 01M015
3 1 M M015 P.S. 015 Roberto Clemente 01 CTT - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN 01M015
4 1 M M015 P.S. 015 Roberto Clemente 02 GEN ED - - - 15.0 1.0 15.0 15.0 15.0 ATS NaN 01M015

demographics

DBN Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per
0 01M015 P.S. 015 ROBERTO CLEMENTE 20052006 89.4 NaN 281 15 36 40 33 38 52 29 38 NaN NaN NaN NaN NaN NaN 36.0 12.8 57.0 20.3 25 9 10 3.6 74 26.3 189 67.3 5 1.8 158.0 56.2 123.0 43.8
1 01M015 P.S. 015 ROBERTO CLEMENTE 20062007 89.4 NaN 243 15 29 39 38 34 42 46 NaN NaN NaN NaN NaN NaN NaN 38.0 15.6 55.0 22.6 19 15 18 7.4 68 28.0 153 63.0 4 1.6 140.0 57.6 103.0 42.4
2 01M015 P.S. 015 ROBERTO CLEMENTE 20072008 89.4 NaN 261 18 43 39 36 38 47 40 NaN NaN NaN NaN NaN NaN NaN 52.0 19.9 60.0 23.0 20 14 16 6.1 77 29.5 157 60.2 7 2.7 143.0 54.8 118.0 45.2
3 01M015 P.S. 015 ROBERTO CLEMENTE 20082009 89.4 NaN 252 17 37 44 32 34 39 49 NaN NaN NaN NaN NaN NaN NaN 48.0 19.0 62.0 24.6 21 17 16 6.3 75 29.8 149 59.1 7 2.8 149.0 59.1 103.0 40.9
4 01M015 P.S. 015 ROBERTO CLEMENTE 20092010 96.5 208 16 40 28 32 30 24 38 NaN NaN NaN NaN NaN NaN NaN 40.0 19.2 46.0 22.1 14 14 16 7.7 67 32.2 118 56.7 6 2.9 124.0 59.6 84.0 40.4

graduation

Demographic DBN School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort
0 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2003 5 s NaN s NaN NaN s NaN NaN s NaN NaN s NaN NaN s NaN s NaN
1 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2004 55 37 67.3 17 30.9 45.9 0 0.0 0.0 17 30.9 45.9 20 36.4 54.1 15 27.3 3 5.5
2 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2005 64 43 67.2 27 42.2 62.8 0 0.0 0.0 27 42.2 62.8 16 25.0 37.2 9 14.1 9 14.1
3 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78 43 55.1 36 46.2 83.7 0 0.0 0.0 36 46.2 83.7 7 9.0 16.3 16 20.5 11 14.1
4 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 Aug 78 44 56.4 37 47.4 84.1 0 0.0 0.0 37 47.4 84.1 7 9.0 15.9 15 19.2 11 14.1

hs_directory

dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_min expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 priority08 priority09 priority10 Location 1 Community Board Council District Census Tract BIN BBL NTA DBN
0 27Q260 Frederick Douglass Academy VI High School Queens Q465 718-471-2154 718-471-2890 9.0 12 NaN NaN Q113, Q22 A to Beach 25th St-Wavecrest 8-21 Bay 25 Street Far Rockaway NY 11691 http://schools.nyc.gov/schoolportals/27/Q260 412.0 Far Rockaway Educational Campus NaN Frederick Douglass Academy (FDA) VI High Schoo... Advisory, Graphic Arts Design, Teaching Intern... Spanish Calculus AB, English Language and Composition,... Biology, Physics B French, Spanish After-school Program, Book, Writing, Homework ... Basketball, Cross Country, Indoor Track, Outdo... Basketball, Cross Country, Indoor Track, Outdo... NaN Step Team, Modern Dance, Hip Hop Dance NaN Jamaica Hospital Medical Center, Peninsula Hos... York College, Brooklyn College, St. John's Col... NaN Queens District Attorney, Sports and Arts Foun... Replications, Inc. Citibank New York Road Runners Foundation (NYRRF) Uniform Required: plain white collared shirt, ... Extended Day Program, Student Summer Orientati... 7:45 AM 2:05 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to Queens students or residents who a... Then to New York City residents who attend an ... Then to Queens students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 8 21 Bay 25 Street\nFar Rockaway, NY 11691\n(4... 14.0 31.0 100802.0 4300730.0 4.157360e+09 Far Rockaway-Bayswater ... 27Q260
1 21K559 Life Academy High School for Film and Music Brooklyn K400 718-333-7750 718-333-7775 9.0 12 NaN NaN B1, B3, B4, B6, B64, B82 D to 25th Ave ; N to Ave U ; N to Gravesend - ... 2630 Benson Avenue Brooklyn NY 11214 http://schools.nyc.gov/schoolportals/21/K559 260.0 Lafayette Educational Campus NaN At Life Academy High School for Film and Music... College Now, iLEARN courses, Art and Film Prod... Spanish NaN Biology, English Literature and Composition, E... NaN Film, Music, Talent Show, Holiday Concert, Stu... Basketball, Bowling, Indoor Track, Soccer, Sof... Basketball, Bowling, Indoor Track, Soccer, Sof... Cricket NaN Coney Island Generation Gap NaN City Tech, Kingsborough Early College Secondar... Museum of the Moving Image, New York Public Li... Institute for Student Achievement Film Life, Inc., SONY Wonder Tech NaN NaN Our school requires completion of a Common Cor... NaN 8:15 AM 3:00 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN NaN NaN NaN 2630 Benson Avenue\nBrooklyn, NY 11214\n(40.59... 13.0 47.0 306.0 3186454.0 3.068830e+09 Gravesend ... 21K559
2 16K393 Frederick Douglass Academy IV Secondary School Brooklyn K026 718-574-2820 718-574-2821 9.0 12 NaN NaN B15, B38, B46, B47, B52, B54, Q24 J to Kosciusko St ; M, Z to Myrtle Ave 1014 Lafayette Avenue Brooklyn NY 11221 http://schools.nyc.gov/schoolportals/16/K393 155.0 NaN NaN The Frederick Douglass Academy IV (FDA IV) Sec... College Now with Medgar Evers College, Fresh P... French, Spanish English Language and Composition, United State... French Language and Culture NaN After-school and Saturday Programs, Art Studio... NaN NaN NaN Basketball Team Achieving Change in our Neighborhood (Teen ACT... NaN Medgar Evers College Noel Pointer School of Music Hip-Hop 4 Life, Urban Arts, and St. Nicks Alli... NaN NaN NaN Dress Code Required: solid white shirt/blouse,... Student Summer Orientation, Weekend Program of... 8:00 AM 2:20 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to Brooklyn students or residents who att... Then to New York City residents who attend an ... Then to Brooklyn students or residents Then to New York City residents NaN NaN NaN NaN NaN 1014 Lafayette Avenue\nBrooklyn, NY 11221\n(40... 3.0 36.0 291.0 3393805.0 3.016160e+09 Stuyvesant Heights ... 16K393
3 08X305 Pablo Neruda Academy Bronx X450 718-824-1682 718-824-1663 9.0 12 NaN NaN Bx22, Bx27, Bx36, Bx39, Bx5 NaN 1980 Lafayette Avenue Bronx NY 10473 www.pablonerudaacademy.org 335.0 Adlai E. Stevenson Educational Campus NaN Our mission is to engage, inspire, and educate... Advanced Placement courses, Electives courses ... Spanish Art History, English Language and Composition,... NaN Spanish Youth Court, Student Government, Youth Service... Basketball, Outdoor Track, Softball, Tennis, V... Basketball, Outdoor Track, Softball, Tennis, V... NaN Baseball, Basketball, Flag Football, Soccer, S... NaN Soundview Health Center, Bronx Lebanon Hospita... Hostos Community College, Monroe College, Lehm... Chilean Consulate, Materials for the Arts Network for Teaching Entrepreneurship (NFTE), ... NaN NaN iLearnNYC All students are individually programmed (base... Extended Day Program 8:00 AM 3:50 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to Bronx students or residents who at... Then to New York City residents who attend an ... Then to Bronx students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 1980 Lafayette Avenue\nBronx, NY 10473\n(40.82... 9.0 18.0 16.0 2022205.0 2.036040e+09 Soundview-Castle Hill-Clason Point-Harding Par... 08X305
4 03M485 Fiorello H. LaGuardia High School of Music & A... Manhattan M485 212-496-0700 212-724-5748 9.0 12 NaN NaN M10, M104, M11, M20, M31, M5, M57, M66, M7, M72 1 to 66th St - Lincoln Center ; 2, 3 to 72nd S... 100 Amsterdam Avenue New York NY 10023 www.laguardiahs.org 2730.0 NaN Specialized School We enjoy an international reputation as the fi... Students have a daily program that includes bo... French, Italian, Japanese, Spanish Art History, Biology, Calculus AB, Calculus BC... NaN Spanish Amnesty International, Anime, Annual Musical, ... Basketball, Bowling, Cross Country, Fencing, G... Basketball, Bowling, Cross Country, Fencing, G... NaN NaN Lincoln Center for the Performing Arts Mount Sinai Medical Center The Cooper Union for the Advancement of Scienc... Lincoln Center for the Performing Arts, Americ... Junior Achievement, Red Cross, United Nations ... Sony Music, Warner Music Group, Capital Cities... NaN NaN Chancellor’s Arts Endorsed Diploma NaN 8:00 AM 4:00 PM This school will provide students with disabil... ESL Functionally Accessible 6 Open to New York City residents Admission is based on the outcome of a competi... Students must audition for each program (studi... Students must be residents of New York City at... NaN NaN NaN NaN NaN NaN 100 Amsterdam Avenue\nNew York, NY 10023\n(40.... 7.0 6.0 151.0 1030341.0 1.011560e+09 Lincoln Square ... 03M485

math_test_results

DBN Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 %
0 01M015 3 2006 All Students 39 667.0 2.0 5.1 11.0 28.2 20.0 51.3 6.0 15.4 26.0 66.7
1 01M015 3 2007 All Students 31 672.0 2.0 6.5 3.0 9.7 22.0 71.0 4.0 12.9 26.0 83.9
2 01M015 3 2008 All Students 37 668.0 0.0 0.0 6.0 16.2 29.0 78.4 2.0 5.4 31.0 83.8
3 01M015 3 2009 All Students 33 668.0 0.0 0.0 4.0 12.1 28.0 84.8 1.0 3.0 29.0 87.9
4 01M015 3 2010 All Students 26 677.0 6.0 23.1 12.0 46.2 6.0 23.1 2.0 7.7 8.0 30.8

sat_results

DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355 404 363
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383 423 366
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377 402 370
3 01M458 FORSYTH SATELLITE ACADEMY 7 414 401 359
4 01M509 MARTA VALLE HIGH SCHOOL 44 390 433 384

survey

DBN rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 01M015 NaN 88 60 NaN 22.0 90.0 8.5 7.6 7.5 7.8 7.5 7.8 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9
1 01M019 NaN 100 60 NaN 34.0 161.0 8.4 7.6 7.6 7.8 8.6 8.5 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4
2 01M020 NaN 88 73 NaN 42.0 367.0 8.9 8.3 8.3 8.6 7.6 6.3 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0
3 01M034 89.0 73 50 145.0 29.0 151.0 8.8 8.2 8.0 8.5 7.0 6.2 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9
4 01M063 NaN 100 60 NaN 23.0 90.0 8.7 7.9 8.1 7.9 8.4 7.3 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0

To be able to compare SAT scores with different variables, there should be only one row per high school.

Condensing class_size

In this dataset, the GRADE and PROGRAM TYPE columns have multiple values for each high school. By restricting each field to a single value, most of the duplicate rows can be filtered out.

The following code will:

  • Only select values from class_size where the GRADE column equals 09-12 (high school grade 9th through 12th)
  • Only select values from class_size where the PROGRAM TYPE column equals GEN ED (restrict values to general education programs only)
  • Group class_size by DBN, then find the average class_size values for each school
In [11]:
# Create a class_size DataFrame for ease of manipulation
class_size=data['class_size']
# Only select values from class_size where the GRADE column equals 09-12
class_size=class_size[class_size['GRADE ']=='09-12']
# Only select values from class_size where the PROGRAM TYPE column equals GEN ED
class_size=class_size[class_size['PROGRAM TYPE']=='GEN ED']
# Group class_size by DBN, then take the average by columns (find the average class_size values for each school)
class_size=class_size.groupby('DBN').agg(np.mean)
# Reset the index so DBN is a column again
class_size.reset_index(inplace=True)
# Replace the class_size DataFrame in the data dictionary by the class_size DataFrame just created
data['class_size']=class_size

Condensing demographics

The dataset contains data for several years, specified by the schoolyear column. Only the rows for the most recent year will be kept (20112012).

In [12]:
demographics=data['demographics']
demographics=demographics[demographics['schoolyear']==20112012]
data['demographics']=demographics

Condensing math_test_results

This dataset is segmented by Grade and Year. Only the data for the most recent year and highest grade will be kept.

In [13]:
data['math_test_results']=data['math_test_results'][data['math_test_results']['Year']==2011]
data['math_test_results']=data['math_test_results'][data['math_test_results']['Grade']=='8']

Condensing graduation

In this case, the data used will be the one for the class of 2010 (Cohort=2006) and with metrics calculated across de entire cohort (Demographic=Total Cohort).

In [14]:
data['graduation']=data['graduation'][data['graduation']['Cohort']=='2006']
data['graduation']=data['graduation'][data['graduation']['Demographic']=='Total Cohort']

Resulting data

Data cleaning is an import step to avoid arriving at nonsense conclusions (to avoid the outcome of the old adage: garbage in, garbage out).

The resulting data is presented by the following code.

In [15]:
# Loop through the keys and values of the dictionary
for k,v in data.items():
  # Print the name of each DataFrame, followed by its first five rows
  display(Markdown('##{}'.format(k)))
  display(v.head())

ap_2010

DBN SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5
0 01M448 UNIVERSITY NEIGHBORHOOD H.S. 39.0 49.0 10.0
1 01M450 EAST SIDE COMMUNITY HS 19.0 21.0 NaN
2 01M515 LOWER EASTSIDE PREP 24.0 26.0 24.0
3 01M539 NEW EXPLORATIONS SCI,TECH,MATH 255.0 377.0 191.0
4 02M296 High School of Hospitality Management NaN NaN NaN

class_size

DBN CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS SCHOOLWIDE PUPIL-TEACHER RATIO
0 01M292 1 88.0000 4.000000 22.564286 18.50 26.571429 NaN
1 01M332 1 46.0000 2.000000 22.000000 21.00 23.500000 NaN
2 01M378 1 33.0000 1.000000 33.000000 33.00 33.000000 NaN
3 01M448 1 105.6875 4.750000 22.231250 18.25 27.062500 NaN
4 01M450 1 57.6000 2.733333 21.200000 19.40 22.866667 NaN

demographics

DBN Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per
6 01M015 P.S. 015 ROBERTO CLEMENTE 20112012 NaN 89.4 189 13 31 35 28 25 28 29 20.0 10.6 40.0 21.2 23 7 12 6.3 63 33.3 109 57.7 4 2.1 97.0 51.3 92.0 48.7
13 01M019 P.S. 019 ASHER LEVY 20112012 NaN 61.5 328 32 46 52 54 52 46 46 33.0 10.1 59.0 18.0 16 16 51 15.5 81 24.7 158 48.2 28 8.5 147.0 44.8 181.0 55.2
20 01M020 PS 020 ANNA SILVER 20112012 NaN 92.5 626 52 102 121 87 88 85 91 128.0 20.4 97.0 15.5 49 31 190 30.4 55 8.8 357 57.0 16 2.6 330.0 52.7 296.0 47.3
27 01M034 PS 034 FRANKLIN D ROOSEVELT 20112012 NaN 99.7 401 14 34 38 36 45 28 40 55 55 56 34.0 8.5 106.0 26.4 59 16 22 5.5 90 22.4 275 68.6 8 2.0 204.0 50.9 197.0 49.1
35 01M063 PS 063 WILLIAM MCKINLEY 20112012 NaN 78.9 176 18 20 30 21 31 26 30 6.0 3.4 45.0 25.6 34 4 9 5.1 41 23.3 110 62.5 15 8.5 97.0 55.1 79.0 44.9

graduation

Demographic DBN School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort
3 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78 43 55.1 36 46.2 83.7 0 0.0 0.0 36 46.2 83.7 7 9.0 16.3 16 20.5 11 14.1
10 Total Cohort 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 2006 124 53 42.7 42 33.9 79.2 8 6.5 15.1 34 27.4 64.2 11 8.9 20.8 46 37.1 20 16.1
17 Total Cohort 01M450 EAST SIDE COMMUNITY SCHOOL 2006 90 70 77.8 67 74.4 95.7 0 0.0 0.0 67 74.4 95.7 3 3.3 4.3 15 16.7 5 5.6
24 Total Cohort 01M509 MARTA VALLE HIGH SCHOOL 2006 84 47 56.0 40 47.6 85.1 17 20.2 36.2 23 27.4 48.9 7 8.3 14.9 25 29.8 5 6.0
31 Total Cohort 01M515 LOWER EAST SIDE PREPARATORY HIGH SCHO 2006 193 105 54.4 91 47.2 86.7 69 35.8 65.7 22 11.4 21.0 14 7.3 13.3 53 27.5 35 18.1

hs_directory

dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_min expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 priority08 priority09 priority10 Location 1 Community Board Council District Census Tract BIN BBL NTA DBN
0 27Q260 Frederick Douglass Academy VI High School Queens Q465 718-471-2154 718-471-2890 9.0 12 NaN NaN Q113, Q22 A to Beach 25th St-Wavecrest 8-21 Bay 25 Street Far Rockaway NY 11691 http://schools.nyc.gov/schoolportals/27/Q260 412.0 Far Rockaway Educational Campus NaN Frederick Douglass Academy (FDA) VI High Schoo... Advisory, Graphic Arts Design, Teaching Intern... Spanish Calculus AB, English Language and Composition,... Biology, Physics B French, Spanish After-school Program, Book, Writing, Homework ... Basketball, Cross Country, Indoor Track, Outdo... Basketball, Cross Country, Indoor Track, Outdo... NaN Step Team, Modern Dance, Hip Hop Dance NaN Jamaica Hospital Medical Center, Peninsula Hos... York College, Brooklyn College, St. John's Col... NaN Queens District Attorney, Sports and Arts Foun... Replications, Inc. Citibank New York Road Runners Foundation (NYRRF) Uniform Required: plain white collared shirt, ... Extended Day Program, Student Summer Orientati... 7:45 AM 2:05 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to Queens students or residents who a... Then to New York City residents who attend an ... Then to Queens students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 8 21 Bay 25 Street\nFar Rockaway, NY 11691\n(4... 14.0 31.0 100802.0 4300730.0 4.157360e+09 Far Rockaway-Bayswater ... 27Q260
1 21K559 Life Academy High School for Film and Music Brooklyn K400 718-333-7750 718-333-7775 9.0 12 NaN NaN B1, B3, B4, B6, B64, B82 D to 25th Ave ; N to Ave U ; N to Gravesend - ... 2630 Benson Avenue Brooklyn NY 11214 http://schools.nyc.gov/schoolportals/21/K559 260.0 Lafayette Educational Campus NaN At Life Academy High School for Film and Music... College Now, iLEARN courses, Art and Film Prod... Spanish NaN Biology, English Literature and Composition, E... NaN Film, Music, Talent Show, Holiday Concert, Stu... Basketball, Bowling, Indoor Track, Soccer, Sof... Basketball, Bowling, Indoor Track, Soccer, Sof... Cricket NaN Coney Island Generation Gap NaN City Tech, Kingsborough Early College Secondar... Museum of the Moving Image, New York Public Li... Institute for Student Achievement Film Life, Inc., SONY Wonder Tech NaN NaN Our school requires completion of a Common Cor... NaN 8:15 AM 3:00 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN NaN NaN NaN 2630 Benson Avenue\nBrooklyn, NY 11214\n(40.59... 13.0 47.0 306.0 3186454.0 3.068830e+09 Gravesend ... 21K559
2 16K393 Frederick Douglass Academy IV Secondary School Brooklyn K026 718-574-2820 718-574-2821 9.0 12 NaN NaN B15, B38, B46, B47, B52, B54, Q24 J to Kosciusko St ; M, Z to Myrtle Ave 1014 Lafayette Avenue Brooklyn NY 11221 http://schools.nyc.gov/schoolportals/16/K393 155.0 NaN NaN The Frederick Douglass Academy IV (FDA IV) Sec... College Now with Medgar Evers College, Fresh P... French, Spanish English Language and Composition, United State... French Language and Culture NaN After-school and Saturday Programs, Art Studio... NaN NaN NaN Basketball Team Achieving Change in our Neighborhood (Teen ACT... NaN Medgar Evers College Noel Pointer School of Music Hip-Hop 4 Life, Urban Arts, and St. Nicks Alli... NaN NaN NaN Dress Code Required: solid white shirt/blouse,... Student Summer Orientation, Weekend Program of... 8:00 AM 2:20 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to Brooklyn students or residents who att... Then to New York City residents who attend an ... Then to Brooklyn students or residents Then to New York City residents NaN NaN NaN NaN NaN 1014 Lafayette Avenue\nBrooklyn, NY 11221\n(40... 3.0 36.0 291.0 3393805.0 3.016160e+09 Stuyvesant Heights ... 16K393
3 08X305 Pablo Neruda Academy Bronx X450 718-824-1682 718-824-1663 9.0 12 NaN NaN Bx22, Bx27, Bx36, Bx39, Bx5 NaN 1980 Lafayette Avenue Bronx NY 10473 www.pablonerudaacademy.org 335.0 Adlai E. Stevenson Educational Campus NaN Our mission is to engage, inspire, and educate... Advanced Placement courses, Electives courses ... Spanish Art History, English Language and Composition,... NaN Spanish Youth Court, Student Government, Youth Service... Basketball, Outdoor Track, Softball, Tennis, V... Basketball, Outdoor Track, Softball, Tennis, V... NaN Baseball, Basketball, Flag Football, Soccer, S... NaN Soundview Health Center, Bronx Lebanon Hospita... Hostos Community College, Monroe College, Lehm... Chilean Consulate, Materials for the Arts Network for Teaching Entrepreneurship (NFTE), ... NaN NaN iLearnNYC All students are individually programmed (base... Extended Day Program 8:00 AM 3:50 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to Bronx students or residents who at... Then to New York City residents who attend an ... Then to Bronx students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 1980 Lafayette Avenue\nBronx, NY 10473\n(40.82... 9.0 18.0 16.0 2022205.0 2.036040e+09 Soundview-Castle Hill-Clason Point-Harding Par... 08X305
4 03M485 Fiorello H. LaGuardia High School of Music & A... Manhattan M485 212-496-0700 212-724-5748 9.0 12 NaN NaN M10, M104, M11, M20, M31, M5, M57, M66, M7, M72 1 to 66th St - Lincoln Center ; 2, 3 to 72nd S... 100 Amsterdam Avenue New York NY 10023 www.laguardiahs.org 2730.0 NaN Specialized School We enjoy an international reputation as the fi... Students have a daily program that includes bo... French, Italian, Japanese, Spanish Art History, Biology, Calculus AB, Calculus BC... NaN Spanish Amnesty International, Anime, Annual Musical, ... Basketball, Bowling, Cross Country, Fencing, G... Basketball, Bowling, Cross Country, Fencing, G... NaN NaN Lincoln Center for the Performing Arts Mount Sinai Medical Center The Cooper Union for the Advancement of Scienc... Lincoln Center for the Performing Arts, Americ... Junior Achievement, Red Cross, United Nations ... Sony Music, Warner Music Group, Capital Cities... NaN NaN Chancellor’s Arts Endorsed Diploma NaN 8:00 AM 4:00 PM This school will provide students with disabil... ESL Functionally Accessible 6 Open to New York City residents Admission is based on the outcome of a competi... Students must audition for each program (studi... Students must be residents of New York City at... NaN NaN NaN NaN NaN NaN 100 Amsterdam Avenue\nNew York, NY 10023\n(40.... 7.0 6.0 151.0 1030341.0 1.011560e+09 Lincoln Square ... 03M485

math_test_results

DBN Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 %
111 01M034 8 2011 All Students 48 646.0 15.0 31.3 22.0 45.8 11.0 22.9 0.0 0.0 11.0 22.9
280 01M140 8 2011 All Students 61 665.0 1.0 1.6 43.0 70.5 17.0 27.9 0.0 0.0 17.0 27.9
346 01M184 8 2011 All Students 49 727.0 0.0 0.0 0.0 0.0 5.0 10.2 44.0 89.8 49.0 100.0
388 01M188 8 2011 All Students 49 658.0 10.0 20.4 26.0 53.1 10.0 20.4 3.0 6.1 13.0 26.5
411 01M292 8 2011 All Students 49 650.0 15.0 30.6 25.0 51.0 7.0 14.3 2.0 4.1 9.0 18.4

sat_results

DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355 404 363
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383 423 366
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377 402 370
3 01M458 FORSYTH SATELLITE ACADEMY 7 414 401 359
4 01M509 MARTA VALLE HIGH SCHOOL 44 390 433 384

survey

DBN rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 01M015 NaN 88 60 NaN 22.0 90.0 8.5 7.6 7.5 7.8 7.5 7.8 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9
1 01M019 NaN 100 60 NaN 34.0 161.0 8.4 7.6 7.6 7.8 8.6 8.5 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4
2 01M020 NaN 88 73 NaN 42.0 367.0 8.9 8.3 8.3 8.6 7.6 6.3 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0
3 01M034 89.0 73 50 145.0 29.0 151.0 8.8 8.2 8.0 8.5 7.0 6.2 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9
4 01M063 NaN 100 60 NaN 23.0 90.0 8.7 7.9 8.1 7.9 8.4 7.3 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0

Computing variables

SAT score

The sat_results dataset has the average score by each of the three sections of the SAT exam in the SAT Critical Reading Avg. Score, SAT Math Avg. Score and SAT Writing Avg. Score columns. The total SAT score is simply the sum of these three columns.

In [16]:
# List of SAT score columns from the sat_results DataFrame
cols=['SAT Critical Reading Avg. Score','SAT Math Avg. Score','SAT Writing Avg. Score']
# Convert each column to numeric and add the columns to get the total SAT score sat_score
data['sat_results'][cols]=data['sat_results'][cols].apply(pd.to_numeric,errors='coerce')
data['sat_results']['sat_score']=data['sat_results'][cols].sum(axis=1)
display(data['sat_results'])
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355.0 404.0 363.0 1122.0
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383.0 423.0 366.0 1172.0
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377.0 402.0 370.0 1149.0
3 01M458 FORSYTH SATELLITE ACADEMY 7 414.0 401.0 359.0 1174.0
4 01M509 MARTA VALLE HIGH SCHOOL 44 390.0 433.0 384.0 1207.0
... ... ... ... ... ... ... ...
473 75X012 P.S. X012 LEWIS AND CLARK SCHOOL s NaN NaN NaN 0.0
474 75X754 J. M. RAPPORT SCHOOL CAREER DEVELOPMENT s NaN NaN NaN 0.0
475 79M645 SCHOOL FOR COOPERATIVE TECHNICAL EDUCATION s NaN NaN NaN 0.0
476 79Q950 GED PLUS s CITYWIDE 8 496.0 400.0 426.0 1322.0
477 79X490 PHOENIX ACADEMY 9 367.0 370.0 360.0 1097.0

478 rows × 7 columns

Geographic coordinates

The GIS data from each school can be used to display them in a map. This information is contained in the Location 1 column from the hs_directory dataset.

The typical value of the Location 1 column is formed by three lines:

284 Baltic Street  
Brooklyn, NY 11201  
(40.685451806, -73.993491465)

The third row contains the latitude and longitude coordinates for the school. These values will be obtained using string splitting.

In [17]:
# Split each row of Location 1 on line feeds, get the last element (coordinaste), replace parentheses by empty strings, split result on commas, get the first (lat) and second (lon) elements
data['hs_directory']['lat'] = data['hs_directory']['Location 1'].apply(lambda x: x.split('\n')[-1].replace('(', '').replace(')', '').split(', ')[0])
data['hs_directory']['lon'] = data['hs_directory']['Location 1'].apply(lambda x: x.split('\n')[-1].replace('(', '').replace(')', '').split(', ')[1])
# Convert new columns to numeric
data['hs_directory'][['lat','lon']]=data['hs_directory'][['lat','lon']].apply(pd.to_numeric,errors='coerce')

The result of computing the new variables is shown below.

In [18]:
for k,v in data.items():
  display(Markdown('##{}'.format(k)))
  display(v.head())

ap_2010

DBN SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5
0 01M448 UNIVERSITY NEIGHBORHOOD H.S. 39.0 49.0 10.0
1 01M450 EAST SIDE COMMUNITY HS 19.0 21.0 NaN
2 01M515 LOWER EASTSIDE PREP 24.0 26.0 24.0
3 01M539 NEW EXPLORATIONS SCI,TECH,MATH 255.0 377.0 191.0
4 02M296 High School of Hospitality Management NaN NaN NaN

class_size

DBN CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS SCHOOLWIDE PUPIL-TEACHER RATIO
0 01M292 1 88.0000 4.000000 22.564286 18.50 26.571429 NaN
1 01M332 1 46.0000 2.000000 22.000000 21.00 23.500000 NaN
2 01M378 1 33.0000 1.000000 33.000000 33.00 33.000000 NaN
3 01M448 1 105.6875 4.750000 22.231250 18.25 27.062500 NaN
4 01M450 1 57.6000 2.733333 21.200000 19.40 22.866667 NaN

demographics

DBN Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per
6 01M015 P.S. 015 ROBERTO CLEMENTE 20112012 NaN 89.4 189 13 31 35 28 25 28 29 20.0 10.6 40.0 21.2 23 7 12 6.3 63 33.3 109 57.7 4 2.1 97.0 51.3 92.0 48.7
13 01M019 P.S. 019 ASHER LEVY 20112012 NaN 61.5 328 32 46 52 54 52 46 46 33.0 10.1 59.0 18.0 16 16 51 15.5 81 24.7 158 48.2 28 8.5 147.0 44.8 181.0 55.2
20 01M020 PS 020 ANNA SILVER 20112012 NaN 92.5 626 52 102 121 87 88 85 91 128.0 20.4 97.0 15.5 49 31 190 30.4 55 8.8 357 57.0 16 2.6 330.0 52.7 296.0 47.3
27 01M034 PS 034 FRANKLIN D ROOSEVELT 20112012 NaN 99.7 401 14 34 38 36 45 28 40 55 55 56 34.0 8.5 106.0 26.4 59 16 22 5.5 90 22.4 275 68.6 8 2.0 204.0 50.9 197.0 49.1
35 01M063 PS 063 WILLIAM MCKINLEY 20112012 NaN 78.9 176 18 20 30 21 31 26 30 6.0 3.4 45.0 25.6 34 4 9 5.1 41 23.3 110 62.5 15 8.5 97.0 55.1 79.0 44.9

graduation

Demographic DBN School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort
3 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78 43 55.1 36 46.2 83.7 0 0.0 0.0 36 46.2 83.7 7 9.0 16.3 16 20.5 11 14.1
10 Total Cohort 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 2006 124 53 42.7 42 33.9 79.2 8 6.5 15.1 34 27.4 64.2 11 8.9 20.8 46 37.1 20 16.1
17 Total Cohort 01M450 EAST SIDE COMMUNITY SCHOOL 2006 90 70 77.8 67 74.4 95.7 0 0.0 0.0 67 74.4 95.7 3 3.3 4.3 15 16.7 5 5.6
24 Total Cohort 01M509 MARTA VALLE HIGH SCHOOL 2006 84 47 56.0 40 47.6 85.1 17 20.2 36.2 23 27.4 48.9 7 8.3 14.9 25 29.8 5 6.0
31 Total Cohort 01M515 LOWER EAST SIDE PREPARATORY HIGH SCHO 2006 193 105 54.4 91 47.2 86.7 69 35.8 65.7 22 11.4 21.0 14 7.3 13.3 53 27.5 35 18.1

hs_directory

dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_min expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 priority08 priority09 priority10 Location 1 Community Board Council District Census Tract BIN BBL NTA DBN lat lon
0 27Q260 Frederick Douglass Academy VI High School Queens Q465 718-471-2154 718-471-2890 9.0 12 NaN NaN Q113, Q22 A to Beach 25th St-Wavecrest 8-21 Bay 25 Street Far Rockaway NY 11691 http://schools.nyc.gov/schoolportals/27/Q260 412.0 Far Rockaway Educational Campus NaN Frederick Douglass Academy (FDA) VI High Schoo... Advisory, Graphic Arts Design, Teaching Intern... Spanish Calculus AB, English Language and Composition,... Biology, Physics B French, Spanish After-school Program, Book, Writing, Homework ... Basketball, Cross Country, Indoor Track, Outdo... Basketball, Cross Country, Indoor Track, Outdo... NaN Step Team, Modern Dance, Hip Hop Dance NaN Jamaica Hospital Medical Center, Peninsula Hos... York College, Brooklyn College, St. John's Col... NaN Queens District Attorney, Sports and Arts Foun... Replications, Inc. Citibank New York Road Runners Foundation (NYRRF) Uniform Required: plain white collared shirt, ... Extended Day Program, Student Summer Orientati... 7:45 AM 2:05 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to Queens students or residents who a... Then to New York City residents who attend an ... Then to Queens students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 8 21 Bay 25 Street\nFar Rockaway, NY 11691\n(4... 14.0 31.0 100802.0 4300730.0 4.157360e+09 Far Rockaway-Bayswater ... 27Q260 40.601989 -73.762834
1 21K559 Life Academy High School for Film and Music Brooklyn K400 718-333-7750 718-333-7775 9.0 12 NaN NaN B1, B3, B4, B6, B64, B82 D to 25th Ave ; N to Ave U ; N to Gravesend - ... 2630 Benson Avenue Brooklyn NY 11214 http://schools.nyc.gov/schoolportals/21/K559 260.0 Lafayette Educational Campus NaN At Life Academy High School for Film and Music... College Now, iLEARN courses, Art and Film Prod... Spanish NaN Biology, English Literature and Composition, E... NaN Film, Music, Talent Show, Holiday Concert, Stu... Basketball, Bowling, Indoor Track, Soccer, Sof... Basketball, Bowling, Indoor Track, Soccer, Sof... Cricket NaN Coney Island Generation Gap NaN City Tech, Kingsborough Early College Secondar... Museum of the Moving Image, New York Public Li... Institute for Student Achievement Film Life, Inc., SONY Wonder Tech NaN NaN Our school requires completion of a Common Cor... NaN 8:15 AM 3:00 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN NaN NaN NaN 2630 Benson Avenue\nBrooklyn, NY 11214\n(40.59... 13.0 47.0 306.0 3186454.0 3.068830e+09 Gravesend ... 21K559 40.593594 -73.984729
2 16K393 Frederick Douglass Academy IV Secondary School Brooklyn K026 718-574-2820 718-574-2821 9.0 12 NaN NaN B15, B38, B46, B47, B52, B54, Q24 J to Kosciusko St ; M, Z to Myrtle Ave 1014 Lafayette Avenue Brooklyn NY 11221 http://schools.nyc.gov/schoolportals/16/K393 155.0 NaN NaN The Frederick Douglass Academy IV (FDA IV) Sec... College Now with Medgar Evers College, Fresh P... French, Spanish English Language and Composition, United State... French Language and Culture NaN After-school and Saturday Programs, Art Studio... NaN NaN NaN Basketball Team Achieving Change in our Neighborhood (Teen ACT... NaN Medgar Evers College Noel Pointer School of Music Hip-Hop 4 Life, Urban Arts, and St. Nicks Alli... NaN NaN NaN Dress Code Required: solid white shirt/blouse,... Student Summer Orientation, Weekend Program of... 8:00 AM 2:20 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to Brooklyn students or residents who att... Then to New York City residents who attend an ... Then to Brooklyn students or residents Then to New York City residents NaN NaN NaN NaN NaN 1014 Lafayette Avenue\nBrooklyn, NY 11221\n(40... 3.0 36.0 291.0 3393805.0 3.016160e+09 Stuyvesant Heights ... 16K393 40.692134 -73.931503
3 08X305 Pablo Neruda Academy Bronx X450 718-824-1682 718-824-1663 9.0 12 NaN NaN Bx22, Bx27, Bx36, Bx39, Bx5 NaN 1980 Lafayette Avenue Bronx NY 10473 www.pablonerudaacademy.org 335.0 Adlai E. Stevenson Educational Campus NaN Our mission is to engage, inspire, and educate... Advanced Placement courses, Electives courses ... Spanish Art History, English Language and Composition,... NaN Spanish Youth Court, Student Government, Youth Service... Basketball, Outdoor Track, Softball, Tennis, V... Basketball, Outdoor Track, Softball, Tennis, V... NaN Baseball, Basketball, Flag Football, Soccer, S... NaN Soundview Health Center, Bronx Lebanon Hospita... Hostos Community College, Monroe College, Lehm... Chilean Consulate, Materials for the Arts Network for Teaching Entrepreneurship (NFTE), ... NaN NaN iLearnNYC All students are individually programmed (base... Extended Day Program 8:00 AM 3:50 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to Bronx students or residents who at... Then to New York City residents who attend an ... Then to Bronx students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 1980 Lafayette Avenue\nBronx, NY 10473\n(40.82... 9.0 18.0 16.0 2022205.0 2.036040e+09 Soundview-Castle Hill-Clason Point-Harding Par... 08X305 40.822304 -73.855961
4 03M485 Fiorello H. LaGuardia High School of Music & A... Manhattan M485 212-496-0700 212-724-5748 9.0 12 NaN NaN M10, M104, M11, M20, M31, M5, M57, M66, M7, M72 1 to 66th St - Lincoln Center ; 2, 3 to 72nd S... 100 Amsterdam Avenue New York NY 10023 www.laguardiahs.org 2730.0 NaN Specialized School We enjoy an international reputation as the fi... Students have a daily program that includes bo... French, Italian, Japanese, Spanish Art History, Biology, Calculus AB, Calculus BC... NaN Spanish Amnesty International, Anime, Annual Musical, ... Basketball, Bowling, Cross Country, Fencing, G... Basketball, Bowling, Cross Country, Fencing, G... NaN NaN Lincoln Center for the Performing Arts Mount Sinai Medical Center The Cooper Union for the Advancement of Scienc... Lincoln Center for the Performing Arts, Americ... Junior Achievement, Red Cross, United Nations ... Sony Music, Warner Music Group, Capital Cities... NaN NaN Chancellor’s Arts Endorsed Diploma NaN 8:00 AM 4:00 PM This school will provide students with disabil... ESL Functionally Accessible 6 Open to New York City residents Admission is based on the outcome of a competi... Students must audition for each program (studi... Students must be residents of New York City at... NaN NaN NaN NaN NaN NaN 100 Amsterdam Avenue\nNew York, NY 10023\n(40.... 7.0 6.0 151.0 1030341.0 1.011560e+09 Lincoln Square ... 03M485 40.773671 -73.985269

math_test_results

DBN Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 %
111 01M034 8 2011 All Students 48 646.0 15.0 31.3 22.0 45.8 11.0 22.9 0.0 0.0 11.0 22.9
280 01M140 8 2011 All Students 61 665.0 1.0 1.6 43.0 70.5 17.0 27.9 0.0 0.0 17.0 27.9
346 01M184 8 2011 All Students 49 727.0 0.0 0.0 0.0 0.0 5.0 10.2 44.0 89.8 49.0 100.0
388 01M188 8 2011 All Students 49 658.0 10.0 20.4 26.0 53.1 10.0 20.4 3.0 6.1 13.0 26.5
411 01M292 8 2011 All Students 49 650.0 15.0 30.6 25.0 51.0 7.0 14.3 2.0 4.1 9.0 18.4

sat_results

DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355.0 404.0 363.0 1122.0
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383.0 423.0 366.0 1172.0
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377.0 402.0 370.0 1149.0
3 01M458 FORSYTH SATELLITE ACADEMY 7 414.0 401.0 359.0 1174.0
4 01M509 MARTA VALLE HIGH SCHOOL 44 390.0 433.0 384.0 1207.0

survey

DBN rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 01M015 NaN 88 60 NaN 22.0 90.0 8.5 7.6 7.5 7.8 7.5 7.8 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9
1 01M019 NaN 100 60 NaN 34.0 161.0 8.4 7.6 7.6 7.8 8.6 8.5 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4
2 01M020 NaN 88 73 NaN 42.0 367.0 8.9 8.3 8.3 8.6 7.6 6.3 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0
3 01M034 89.0 73 50 145.0 29.0 151.0 8.8 8.2 8.0 8.5 7.0 6.2 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9
4 01M063 NaN 100 60 NaN 23.0 90.0 8.7 7.9 8.1 7.9 8.4 7.3 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0

Combining the datasets

In this stage the different datasets will be combined, using the DBN column.

Not all datasets have the same amount of rows (there exist high schools in some datasets that aren't present in the rest). When joining, it's important to not lose any data.

Given that the interest is in SAT scores, any high schools that don't exist in the sat_results will be ignored. To do this, the joins will all be left joins, with the left table being sat_results.

The below code will:

  • Loop through each dataset
  • Perform a left join
In [19]:
full=data['sat_results']
for k,v in data.items():
  if k!='sat_results':
    full=full.merge(v,on='DBN',how='left')
full.shape
Out[19]:
(479, 180)

Deal with duplicates

It's important to deal with duplicate values so that they don't distort the metrics to be calculated.

In this case, there is only one duplicate row (possibly a typo), so removing the second instance of it (as it contains NA values) solves the problem.

In [20]:
# Check for duplicates
display(full[full['DBN'].duplicated(keep=False)])
# Drop duplicates, keep first occurrence
full.drop_duplicates(subset='DBN',keep='first',inplace=True)
display(full.shape)
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS SCHOOLWIDE PUPIL-TEACHER RATIO Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num ... NTA lat lon Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 % rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
95 04M610 YOUNG WOMEN'S LEADERSHIP SCHOOL 70 432.0 446.0 448.0 1326.0 THE YOUNG WOMEN'S LEADERSHIP SCHOOL OF EAST HA... 41.0 55.0 29.0 4.0 68.294118 2.705882 25.711765 23.0 27.941176 NaN YOUNG WOMEN'S LEADERSHIP SCHOOL 20112012.0 NaN 67.3 449.0 53 50 58 77 78 68 65 10.0 2.2 30.0 ... East Harlem South ... 40.792707 -73.9473 8 2011.0 All Students 48.0 689.0 0.0 0.0 13.0 27.1 24.0 50.0 11.0 22.9 35.0 72.9 85.0 96.0 53.0 380.0 24.0 221.0 8.8 8.1 7.8 8.3 8.9 8.3 8.5 9.0 7.5 6.6 7.2 8.3 8.4 7.7 7.8 8.5
96 04M610 YOUNG WOMEN'S LEADERSHIP SCHOOL 70 432.0 446.0 448.0 1326.0 YOUNG WOMEN'S LEADERSHIP SCH NaN NaN NaN 4.0 68.294118 2.705882 25.711765 23.0 27.941176 NaN YOUNG WOMEN'S LEADERSHIP SCHOOL 20112012.0 NaN 67.3 449.0 53 50 58 77 78 68 65 10.0 2.2 30.0 ... East Harlem South ... 40.792707 -73.9473 8 2011.0 All Students 48.0 689.0 0.0 0.0 13.0 27.1 24.0 50.0 11.0 22.9 35.0 72.9 85.0 96.0 53.0 380.0 24.0 221.0 8.8 8.1 7.8 8.3 8.9 8.3 8.5 9.0 7.5 6.6 7.2 8.3 8.4 7.7 7.8 8.5

2 rows × 180 columns

(478, 180)

AP exam results

To be able to perform computations with the AP exam results, the data must be first converted to numeric form. Also, any missing values should be filled in (in this case, with 0, indicating that the AP exam wasn't taken).

In [21]:
AP_cols=['AP Test Takers ','Total Exams Taken','Number of Exams with scores 3 4 or 5']
full[AP_cols]=full[AP_cols].apply(pd.to_numeric,errors='coerce')
full[AP_cols]=full[AP_cols].fillna(value=0)

Create a school district columns

As it was explained before, the first two numbers of the DBN represent the school district of the school. This number can be used to calculate district-level metrics and plot them in a map.

In [22]:
# Create a school_dist column that contains the first two characters from the DBN column
full['school_dist']=full['DBN'].apply(lambda x:x[:2])

Imputation

To be able to perform many computations, the data must not have any missing values.

To accomplish this, imputation will be used. Imputation is the process of replacing any missing value with another value. In this case, the mean of the column will be used.

In [23]:
# Show quantity of missing values per column
with pd.option_context('display.max_columns', 500):
  display(pd.DataFrame(full.isna().sum()).transpose())
  # Drop columns with ONLY NA values
  full.dropna(how='all',inplace=True,axis=1)
  display(full.shape)
  # Fill NA values with the mean of each column
  full.fillna(full.mean(),inplace=True)
  display(pd.DataFrame(full.isna().sum()).transpose())
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS SCHOOLWIDE PUPIL-TEACHER RATIO Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per Demographic School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_min expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 priority08 priority09 priority10 Location 1 Community Board Council District Census Tract BIN BBL NTA lat lon Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 % rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 school_dist
0 0 0 0 57 57 57 0 225 0 0 0 44 44 44 44 44 44 478 28 28 478 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 89 89 89 89 89 111 89 111 112 89 111 112 89 111 112 89 111 112 89 111 89 111 109 109 109 109 109 110 111 109 478 477 110 172 109 109 109 109 109 109 309 406 109 110 121 170 425 422 109 156 156 351 218 159 293 131 194 202 293 406 303 176 285 110 110 109 109 109 109 109 187 291 357 448 466 476 478 478 478 109 111 111 111 112 112 111 109 109 395 395 395 395 396 396 396 396 396 396 396 396 396 396 396 8 8 8 12 8 9 9 9 9 9 8 8 8 8 12 12 12 12 8 8 8 8 0
(478, 175)
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS Name schoolyear frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per Demographic School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 Location 1 Community Board Council District Census Tract BIN BBL NTA lat lon Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 % rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 school_dist
0 0 0 0 0 0 0 0 225 0 0 0 0 0 0 0 0 0 28 0 0 0 28 28 28 28 28 28 28 28 28 28 28 28 28 28 0 0 0 0 28 28 0 0 0 0 0 0 0 0 0 0 0 0 89 89 89 0 89 0 89 0 0 89 0 0 89 0 0 89 0 0 89 0 89 0 109 109 109 109 109 110 0 0 0 110 172 109 109 109 0 109 0 309 406 109 110 121 170 425 422 109 156 156 351 218 159 293 131 194 202 293 406 303 176 285 110 110 109 109 109 0 109 187 291 357 448 466 476 109 0 0 0 0 0 111 0 0 395 0 395 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In the above output, it can be observed that there are many columns which still have missing values. This is because imputation was performed only on numeric columns.

If instead, only missing values in numeric columns are shown, it can be seen that there aren't missing values for any of them.

In [24]:
# Show quantity of missing values per numeric column
with pd.option_context('display.max_columns', 500):
  display(pd.DataFrame(full.select_dtypes(include=['number']).isna().sum()).transpose())
SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS schoolyear frl_percent total_enrollment ell_num ell_percent sped_num sped_percent asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per Total Cohort Total Grads - % of cohort Total Regents - % of cohort Total Regents - % of grads Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - % of cohort Local - % of grads Still Enrolled - % of cohort Dropped Out - % of cohort grade_span_min grade_span_max expgrade_span_max postcode total_students number_programs Community Board Council District Census Tract BIN BBL lat lon Year Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 % rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

There are many missing values in the non-numeric columns. In this simple analysis, they will be left as is, but the issue should be dealt with if they are going to be used (for example, by replacing the missing values with the most frequent value, if that makes sense).

In [25]:
# Show quantity of missing values per non-numeric column
with pd.option_context('display.max_columns', 500):
  display(pd.DataFrame(full.select_dtypes(exclude=['number']).isna().sum()).transpose())
DBN SCHOOL NAME Num of SAT Test Takers SchoolName Name prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ctt_num selfcontained_num Demographic School Name Cohort Total Grads - n Total Regents - n Advanced Regents - n Regents w/o Advanced - n Local - n Still Enrolled - n Dropped Out - n dbn school_name borough building_code phone_number fax_number bus subway primary_address_line_1 city state_code website campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description priority01 priority02 priority03 priority04 priority05 priority06 priority07 Location 1 NTA Grade Category school_dist
0 0 0 0 225 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 89 89 89 89 89 89 89 89 89 89 109 109 109 109 109 110 110 172 109 109 109 109 309 406 109 110 121 170 425 422 109 156 156 351 218 159 293 131 194 202 293 406 303 176 285 110 110 109 109 109 109 187 291 357 448 466 476 109 111 395 395 0

Finally, as the interest is in SAT scores, any row with a score of zero will be dropped.

In [26]:
full.drop(full[full['sat_score'] ==0].index, inplace=True)
display(full.shape)
(421, 175)

Show the resulting dataset

The first twenty rows of the dataset are shown next.

In [27]:
with pd.option_context('display.max_columns', 500,'display.max_rows', 500):
  display(full.head(20))
  display(full.shape)
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS Name schoolyear frl_percent total_enrollment prek k grade1 grade2 grade3 grade4 grade5 grade6 grade7 grade8 grade9 grade10 grade11 grade12 ell_num ell_percent sped_num sped_percent ctt_num selfcontained_num asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per Demographic School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads Advanced Regents - n Advanced Regents - % of cohort Advanced Regents - % of grads Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort dbn school_name borough building_code phone_number fax_number grade_span_min grade_span_max expgrade_span_max bus subway primary_address_line_1 city state_code postcode website total_students campus_name school_type overview_paragraph program_highlights language_classes advancedplacement_courses online_ap_courses online_language_courses extracurricular_activities psal_sports_boys psal_sports_girls psal_sports_coed school_sports partner_cbo partner_hospital partner_highered partner_cultural partner_nonprofit partner_corporate partner_financial partner_other addtl_info1 addtl_info2 start_time end_time se_services ell_programs school_accessibility_description number_programs priority01 priority02 priority03 priority04 priority05 priority06 priority07 Location 1 Community Board Council District Census Tract BIN BBL NTA lat lon Grade Year Category Number Tested Mean Scale Score Level 1 # Level 1 % Level 2 # Level 2 % Level 3 # Level 3 % Level 4 # Level 4 % Level 3+4 # Level 3+4 % rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 school_dist
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355.0 404.0 363.0 1122.0 NaN 0.0 0.0 0.0 1.0 88.000000 4.000000 22.564286 18.500000 26.571429 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 20112012.0 88.6 422.0 32 33 50 98 79 80 50 94.0 22.3 105.0 24.9 34 35 59.0 14.0 123.0 29.1 227.0 53.8 7.0 1.7 259.0 61.4 163.0 38.6 Total Cohort HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78.000000 43 55.100000 36 46.200000 83.700000 0 0.000000 0.000000 36 46.200000 83.700000 7 9.000000 16.300000 16 20.500000 11 14.100000 01M292 Henry Street School for International Studies Manhattan M056 212-406-9411 212-406-9417 6.000000 12.0 12.0 B39, M14A, M14D, M15, M15-SBS, M21, M22, M9 B, D to Grand St ; F to East Broadway ; J, M, ... 220 Henry Street New York NY 10002.00000 http://schools.nyc.gov/schoolportals/01/M292 323.00000 NaN NaN Henry Street School for International Studies ... Global/International Studies in core subjects,... Chinese (Mandarin), Spanish Psychology Chinese Language and Culture, Spanish Literatu... Chinese (Mandarin), Spanish Math through Card Play; Art, Poetry/Spoken Wor... Softball Softball Soccer Boxing, Track, CHAMPS, Tennis, Flag Football, ... The Henry Street Settlement; Asia Society; Ame... Gouverneur Hospital (Turning Points) New York University Asia Society Heart of America Foundation NaN NaN United Nations NaN NaN 8:30 AM 3:30 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to continuing 8th graders Then to Manhattan students or residents who at... Then to New York City residents who attend an ... Then to Manhattan students or residents Then to New York City residents NaN NaN 220 Henry Street\nNew York, NY 10002\n(40.7137... 3.000000 1.000000 201.000000 1.003223e+06 1.002690e+09 Lower East Side ... 40.713764 -73.985260 8 2011.0 All Students 49.00000 650.000000 15.00000 30.600000 25.000000 51.000000 7.000000 14.300000 2.000000 4.100000 9.000000 18.400000 89.0 70.0 39.0 379.000000 26.0 151.0 7.8 7.7 7.4 7.6 6.3 5.3 6.1 6.5 6.000000 5.600000 6.100000 6.700000 6.7 6.2 6.6 7.0 01
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383.0 423.0 366.0 1172.0 UNIVERSITY NEIGHBORHOOD H.S. 39.0 49.0 10.0 1.0 105.687500 4.750000 22.231250 18.250000 27.062500 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 20112012.0 71.8 394.0 109 97 93 95 83.0 21.1 86.0 21.8 55 10 115.0 29.2 89.0 22.6 181.0 45.9 9.0 2.3 226.0 57.4 168.0 42.6 Total Cohort UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 2006 124.000000 53 42.700000 42 33.900000 79.200000 8 6.500000 15.100000 34 27.400000 64.200000 11 8.900000 20.800000 46 37.100000 20 16.100000 01M448 University Neighborhood High School Manhattan M446 212-962-4341 212-267-5611 9.000000 12.0 12.0 M14A, M14D, M15, M21, M22, M9 F to East Broadway ; J, M, Z to Delancey St-Es... 200 Monroe Street New York NY 10002.00000 www.universityneighborhoodhs.com 299.00000 NaN NaN University Neighborhood High School (UNHS) is ... While attending UNHS, students can earn up to ... Chinese, Spanish Calculus AB, Chinese Language and Culture, Eng... NaN Chinese (Cantonese), Chinese (Mandarin), Spanish Basketball, Badminton, Handball, Glee, Dance, ... Basketball, Bowling, Cross Country, Softball, ... Basketball, Bowling, Cross Country, Softball, ... NaN NaN Grand Street Settlement, Henry Street Settleme... Gouverneur Hospital, The Door, The Mount Sinai... New York University, CUNY Baruch College, Pars... Dance Film Association, Dance Makers Film Work... W!SE, Big Brothers Big Sisters, Peer Health Ex... Deloitte LLP Consulting and Financial Services... NaN Movement Research Incoming students are expected to attend schoo... Community Service Requirement, Dress Code Requ... 8:15 AM 3:15 PM This school will provide students with disabil... ESL Not Functionally Accessible 3.000000 Open to New York City residents For M35B only: Open only to students whose hom... NaN NaN NaN NaN NaN 200 Monroe Street\nNew York, NY 10002\n(40.712... 3.000000 1.000000 202.000000 1.003214e+06 1.002590e+09 Lower East Side ... 40.712332 -73.984797 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 84.0 95.0 10.0 385.000000 37.0 46.0 7.9 7.4 7.2 7.3 6.6 5.8 6.6 7.3 6.000000 5.700000 6.300000 7.000000 6.8 6.3 6.7 7.2 01
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377.0 402.0 370.0 1149.0 EAST SIDE COMMUNITY HS 19.0 21.0 0.0 1.0 57.600000 2.733333 21.200000 19.400000 22.866667 EAST SIDE COMMUNITY HIGH SCHOOL 20112012.0 71.8 598.0 92 73 76 101 93 77 86 30.0 5.0 158.0 26.4 91 19 58.0 9.7 143.0 23.9 331.0 55.4 62.0 10.4 327.0 54.7 271.0 45.3 Total Cohort EAST SIDE COMMUNITY SCHOOL 2006 90.000000 70 77.800000 67 74.400000 95.700000 0 0.000000 0.000000 67 74.400000 95.700000 3 3.300000 4.300000 15 16.700000 5 5.600000 01M450 East Side Community School Manhattan M060 212-460-8467 212-260-9657 6.000000 12.0 12.0 M101, M102, M103, M14A, M14D, M15, M15-SBS, M2... 6 to Astor Place ; L to 1st Ave 420 East 12 Street New York NY 10009.00000 www.eschs.org 649.00000 NaN Consortium School We are a small 6-12 secondary school that prep... Our Advisory System ensures that we can effect... NaN Calculus AB, English Literature and Composition NaN American Sign Language, Arabic, Chinese (Manda... After-School Tutoring, Art Portfolio Classes, ... Basketball, Soccer, Softball Basketball, Soccer, Softball NaN Basketball, Bicycling, Fitness, Flag Football,... University Settlement, Big Brothers Big Sister... NaN Columbia Teachers College, New York University... , Internship Program, Loisaida Art Gallery loc... College Bound Initiative, Center for Collabora... Prudential Securities, Moore Capital, Morgan S... NaN Brooklyn Boulders (Rock Climbing) Students present and defend their work to comm... Our school requires an Academic Portfolio for ... 8:30 AM 3:30 PM This school will provide students with disabil... ESL Not Functionally Accessible 1.000000 Priority to continuing 8th graders Then to New York City residents NaN NaN NaN NaN NaN 420 East 12 Street\nNew York, NY 10009\n(40.72... 3.000000 2.000000 34.000000 1.005974e+06 1.004390e+09 East Village ... 40.729783 -73.983041 8 2011.0 All Students 55.00000 673.000000 2.00000 3.600000 24.000000 43.600000 27.000000 49.100000 2.000000 3.600000 29.000000 52.700000 0.0 98.0 28.0 516.257511 42.0 150.0 8.7 8.2 8.1 8.4 7.3 8.0 8.0 8.8 6.725751 6.166953 6.719313 7.429828 7.9 7.9 7.9 8.4 01
3 01M458 FORSYTH SATELLITE ACADEMY 7 414.0 401.0 359.0 1174.0 NaN 0.0 0.0 0.0 1.0 28.600000 1.200000 23.000000 22.600000 23.400000 SATELLITE ACADEMY HS @ FORSYTHE STREET 20112012.0 72.8 224.0 131 49 44 9.0 4.0 20.0 8.9 3 0 5.0 2.2 77.0 34.4 133.0 59.4 8.0 3.6 97.0 43.3 127.0 56.7 NaN NaN NaN 186.411311 NaN 63.199455 NaN 49.809537 74.757377 NaN 11.913079 15.222678 NaN 37.895095 59.534699 NaN 13.390463 25.243169 NaN 23.760763 NaN 9.951226 NaN NaN NaN NaN NaN NaN 8.457766 12.0 12.0 NaN NaN NaN NaN NaN 10725.96477 NaN 772.02168 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.821138 NaN NaN NaN NaN NaN NaN NaN NaN 6.782016 22.237057 3701.569482 2.587548e+06 2.515377e+09 NaN 40.743327 -73.924909 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 40.0 100.0 23.0 66.000000 10.0 37.0 8.1 7.0 6.7 7.6 8.5 8.2 8.9 8.9 6.800000 6.100000 6.100000 6.800000 7.8 7.1 7.2 7.8 01
4 01M509 MARTA VALLE HIGH SCHOOL 44 390.0 433.0 384.0 1207.0 NaN 0.0 0.0 0.0 1.0 69.642857 3.000000 23.571429 20.000000 27.357143 MARTA VALLE SECONDARY SCHOOL 20112012.0 80.7 367.0 143 100 51 73 41.0 11.2 95.0 25.9 28 36 34.0 9.3 116.0 31.6 209.0 56.9 6.0 1.6 170.0 46.3 197.0 53.7 Total Cohort MARTA VALLE HIGH SCHOOL 2006 84.000000 47 56.000000 40 47.600000 85.100000 17 20.200000 36.200000 23 27.400000 48.900000 7 8.300000 14.900000 25 29.800000 5 6.000000 01M509 Marta Valle High School Manhattan M025 212-473-8152 212-475-7588 9.000000 12.0 12.0 B39, M103, M14A, M14D, M15, M15-SBS, M21, M22,... B, D to Grand St ; F, J, M, Z to Delancey St-E... 145 Stanton Street New York NY 10002.00000 www.martavalle.org 401.00000 NaN NaN Marta Valle High School (MVHS) offers a strong... Advanced Regents Diploma, Early Graduation, up... French, Spanish English Literature and Composition, Studio Art... NaN Spanish Model Peer Leadership Program, 'The Vine' Stud... Rugby, Volleyball Rugby, Volleyball Rugby Volleyball, Zumba NYCDOE Innovation Zone Lab Site, Grand Street ... Gouvenuer's Hospital New York University (NYU), Sarah Lawrence Coll... Young Audiences, The National Arts Club, Educa... College for Every Student (CFES), Morningside ... Estée Lauder Bank of America CASALEAP, Beacon Students Dress for Success, Summer Bridge to S... Community Service Requirement, Extended Day Pr... 8:00 AM 3:30 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to District 1 students or residents Then to Manhattan students or residents Then to New York City residents NaN NaN NaN NaN 145 Stanton Street\nNew York, NY 10002\n(40.72... 3.000000 1.000000 3001.000000 1.004323e+06 1.003540e+09 Chinatown ... 40.720569 -73.985673 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 90.0 100.0 21.0 306.000000 29.0 69.0 7.7 7.4 7.2 7.3 6.4 5.3 6.1 6.8 6.400000 5.900000 6.400000 7.000000 6.9 6.2 6.6 7.0 01
5 01M515 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 112 332.0 557.0 316.0 1205.0 LOWER EASTSIDE PREP 24.0 26.0 24.0 1.0 131.117647 5.529412 22.876471 15.764706 28.588235 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 20112012.0 77.0 562.0 261 209 92 453.0 80.6 7.0 1.2 0 2 476.0 84.7 29.0 5.2 50.0 8.9 5.0 0.9 302.0 53.7 260.0 46.3 Total Cohort LOWER EAST SIDE PREPARATORY HIGH SCHO 2006 193.000000 105 54.400000 91 47.200000 86.700000 69 35.800000 65.700000 22 11.400000 21.000000 14 7.300000 13.300000 53 27.500000 35 18.100000 NaN NaN NaN NaN NaN NaN 8.457766 12.0 12.0 NaN NaN NaN NaN NaN 10725.96477 NaN 772.02168 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.821138 NaN NaN NaN NaN NaN NaN NaN NaN 6.782016 22.237057 3701.569482 2.587548e+06 2.515377e+09 NaN 40.743327 -73.924909 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 95.0 100.0 86.0 454.000000 36.0 389.0 8.3 7.2 7.4 7.5 9.1 7.3 8.7 9.1 8.000000 6.300000 7.000000 7.300000 8.5 7.0 7.7 8.0 01
6 01M539 NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ... 159 522.0 574.0 525.0 1621.0 NEW EXPLORATIONS SCI,TECH,MATH 255.0 377.0 191.0 1.0 156.368421 6.157895 25.510526 19.473684 31.210526 NEW EXPLORATIONS INTO SCIENCE TECH AND MATH 20112012.0 23.0 1613.0 100 107 139 110 114 107 149 126 117 117 123 147 157 4.0 0.2 43.0 2.7 2 0 448.0 27.8 189.0 11.7 229.0 14.2 725.0 44.9 794.0 49.2 819.0 50.8 Total Cohort NEW EXPLORATIONS INTO SCIENCE TECHNO 2006 46.000000 46 100.000000 46 100.000000 100.000000 31 67.400000 67.400000 15 32.600000 32.600000 0 0.000000 0.000000 0 0.000000 0 0.000000 01M539 New Explorations into Science, Technology and ... Manhattan M022 212-677-5190 212-260-8124 8.457766 12.0 12.0 B39, M14A, M14D, M21, M22, M8, M9 F, J, M, Z to Delancey St-Essex St 111 Columbia Street New York NY 10002.00000 www.nestmk12.net 1725.00000 NaN NaN New Explorations into Science, Technology and ... 1st level science sequence - 9th grade: Regent... Chinese (Mandarin), French, Italian, Latin, Sp... Biology, Calculus AB, Calculus BC, Chemistry, ... NaN NaN After-school Jazz Band, Annual Coffee House Co... Basketball, Fencing, Indoor Track Basketball, Fencing, Indoor Track NaN Badminton, Baseball, Cross-Country, Dance, Out... 7th Precinct Community Affairs, NYCWastele$$, ... NaN Hunter College, New York University, Cornell U... VH1, Dancing Classrooms, Center for Arts Educa... After 3 Time Warner Cable, Google, IBM, MET Project, S... NaN NaN Dress Code Required: Business Casual - shirt/b... NaN 8:15 AM 4:00 PM This school will provide students with disabil... ESL Not Functionally Accessible 1.000000 Priority to continuing 8th graders Then to New York City residents NaN NaN NaN NaN NaN 111 Columbia Street\nNew York, NY 10002\n(40.7... 3.000000 2.000000 2201.000000 1.004070e+06 1.003350e+09 Lower East Side ... 40.718725 -73.979426 8 2011.0 All Students 142.00000 724.000000 0.00000 0.000000 1.000000 0.700000 22.000000 15.500000 119.000000 83.800000 141.000000 99.300000 98.0 68.0 51.0 923.000000 67.0 736.0 8.5 7.9 7.9 8.4 7.6 5.6 5.9 7.3 7.300000 6.400000 7.000000 7.700000 7.8 6.7 6.9 7.8 01
7 01M650 CASCADES HIGH SCHOOL 18 417.0 418.0 411.0 1246.0 NaN 0.0 0.0 0.0 1.0 64.125000 2.937500 21.781250 18.687500 24.750000 CASCADES HIGH SCHOOL 20112012.0 69.8 218.0 5 89 59 65 7.0 3.2 15.0 6.9 1 0 1.0 0.5 99.0 45.4 108.0 49.5 9.0 4.1 87.0 39.9 131.0 60.1 Total Cohort CASCADES HIGH SCHOOL 2006 89.000000 49 55.100000 36 40.400000 73.500000 0 0.000000 0.000000 36 40.400000 73.500000 13 14.600000 26.500000 34 38.200000 6 6.700000 NaN NaN NaN NaN NaN NaN 8.457766 12.0 12.0 NaN NaN NaN NaN NaN 10725.96477 NaN 772.02168 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.821138 NaN NaN NaN NaN NaN NaN NaN NaN 6.782016 22.237057 3701.569482 2.587548e+06 2.515377e+09 NaN 40.743327 -73.924909 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 75.0 86.0 19.0 125.000000 12.0 30.0 9.0 8.4 8.1 8.6 7.6 7.5 8.3 8.7 8.100000 6.900000 7.900000 8.400000 8.3 7.6 8.1 8.6 01
8 01M696 BARD HIGH SCHOOL EARLY COLLEGE 130 624.0 604.0 628.0 1856.0 NaN 0.0 0.0 0.0 1.0 214.166667 10.250000 20.975000 17.166667 24.250000 BARD HIGH SCHOOL EARLY COLLEGE 20112012.0 18.0 617.0 184 162 128 143 1.0 0.2 5.0 0.8 0 0 93.0 15.1 93.0 15.1 112.0 18.2 307.0 49.8 193.0 31.3 424.0 68.7 Total Cohort BARD HIGH SCHOOL EARLY COLLEGE 2006 139.000000 134 96.400000 134 96.400000 100.000000 0 0.000000 0.000000 134 96.400000 100.000000 0 0.000000 0.000000 4 2.900000 1 0.700000 01M696 Bard High School Early College Manhattan M097 212-995-8479 212-777-4702 9.000000 12.0 12.0 M14A, M14D, M21, M22, M9 NaN 525 East Houston Street New York NY 10002.00000 www.bard.edu/bhsec 560.00000 NaN NaN Bard High School Early College Manhattan (BHSE... In the first two years at BHSEC, students unde... Chinese (Mandarin), Latin, Spanish NaN NaN NaN Bard Bulletin online newspaper, Bardvark stude... Basketball, Soccer, Tennis, Volleyball Basketball, Soccer, Tennis, Volleyball Outdoor Track Co-ed Ultimate Frisbee Lower East Side Girls Club, Third Street Music... NaN Bard College, Bard College at Simon's Rock, Ro... American Symphony Orchestra, American Museum o... NaN NaN NaN New York Academy of Sciences NaN Student Summer Orientation 9:00 AM 3:50 PM This school will provide students with disabil... ESL Not Functionally Accessible 1.000000 Open to New York City residents NaN NaN NaN NaN NaN NaN 525 East Houston Street\nNew York, NY 10002\n(... 3.000000 2.000000 1002.000000 1.004062e+06 1.003250e+09 Lower East Side ... 40.718962 -73.976066 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 98.0 81.0 50.0 561.000000 30.0 283.0 8.8 8.2 8.3 9.1 8.2 7.4 7.5 8.3 8.300000 7.300000 8.000000 8.900000 8.5 7.6 8.0 8.7 01
9 02M047 47 THE AMERICAN SIGN LANGUAGE AND ENGLISH SECO... 16 395.0 400.0 387.0 1182.0 NaN 0.0 0.0 0.0 2.0 26.818182 1.636364 16.072727 15.090909 17.090909 47 THE AMERICAN SIGN LANGUAGE AND ENGLISH DUAL L 20112012.0 66.9 174.0 50 63 38 23 14.0 8.0 56.0 32.2 34 10 3.0 1.7 56.0 32.2 103.0 59.2 11.0 6.3 74.0 42.5 100.0 57.5 Total Cohort 47 THE AMERICAN SIGN LANGUAGE AND ENG 2006 25.000000 19 76.000000 8 32.000000 42.100000 0 0.000000 0.000000 8 32.000000 42.100000 11 44.000000 57.900000 4 16.000000 1 4.000000 02M047 47 The American Sign Language and English Seco... Manhattan M047 917-326-6668 917-326-6688 9.000000 12.0 12.0 M101, M102, M14A, M14D, M15, M15-SBS, M2, M23,... 4, 5, Q to 14th St-Union Square ; 6, N, R to 2... 223 East 23 Street New York NY 10010.00000 www.47aslhs.org 184.00000 NaN NaN In addition to the New York State Regents curr... Small class sizes, Pre-College Now for ninth g... American Sign Language NaN NaN NaN Academic Bowl, ACT and SAT Test Preparation, A... NaN NaN NaN Basketball, Cross Country, Track, Volleyball WRI-Welfare Rights Initiative at Hunter Colleg... NaN New York University (NYU), The City University... Children's Museum of Manhattan, Third Street M... NaN PENCIL NaN NaN Our student body is composed of deaf, hard-of-... Community Service Requirement, Extended Day Pr... 8:45 AM 3:45 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to “47” American Sign Language & Engl... Then to New York City residents who know or ar... NaN NaN NaN NaN NaN 223 East 23 Street\nNew York, NY 10010\n(40.73... 6.000000 2.000000 64.000000 2.587548e+06 2.515377e+09 Gramercy ... 40.738599 -73.982512 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 72.0 76.0 30.0 109.000000 16.0 43.0 8.9 7.7 7.9 8.1 8.1 6.1 7.7 7.2 7.300000 6.300000 7.000000 7.500000 8.1 6.7 7.5 7.6 02
10 02M288 FOOD AND FINANCE HIGH SCHOOL 62 409.0 393.0 392.0 1194.0 NaN 0.0 0.0 0.0 2.0 88.500000 3.916667 22.683333 19.333333 26.166667 FOOD AND FINANCE HIGH SCHOOL 20112012.0 68.4 433.0 135 122 88 88 9.0 2.1 80.0 18.5 69 4 13.0 3.0 191.0 44.1 206.0 47.6 21.0 4.8 195.0 45.0 238.0 55.0 Total Cohort FOOD AND FINANCE HIGH SCHOOL 2006 102.000000 91 89.200000 77 75.500000 84.600000 0 0.000000 0.000000 77 75.500000 84.600000 14 13.700000 15.400000 5 4.900000 3 2.900000 02M288 Food and Finance High School Manhattan M535 212-586-2943 212-586-4205 9.000000 12.0 12.0 M104, M11, M31, M34A-SBS, M42, M50, M57 C, E to 50th St 525 West 50Th Street New York NY 10019.00000 www.foodfinancehs.org 443.00000 Park West Educational Campus CTE School We offer an academically challenging Career an... Students may receive CTE endorsement in Culina... Spanish English Language and Composition, Environmenta... NaN NaN 4-H, Anime, Arts, Baking, Cornell Nutrition, D... Basketball & JV Basketball, Handball, JV Softball Basketball & JV Basketball, Handball, JV Softball Bowling, Tennis Step Team, Yoga New York State Restaurant Association, New Yor... NaN Cornell University Cooperative Extension, King... American Museum of Natural History, Museum of ... Food Education Fund, Youth Service Opportunity... Food Network Municipal Credit Union Celebrity Chefs: Marc Murphy, Scott Conant, Am... Uniform Required: Solid white or blue school p... NaN 7:40 AM 2:30 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN 525 West 50Th Street\nNew York, NY 10019\n(40.... 4.000000 3.000000 135.000000 1.083802e+06 1.010790e+09 Clinton ... 40.765027 -73.992517 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 86.0 100.0 57.0 348.000000 35.0 229.0 7.6 7.0 6.9 7.6 7.3 7.1 7.8 7.7 6.200000 5.700000 6.100000 7.200000 7.0 6.6 6.9 7.5 02
11 02M294 ESSEX STREET ACADEMY 53 394.0 384.0 378.0 1156.0 NaN 0.0 0.0 0.0 2.0 65.000000 4.357143 14.900000 12.285714 17.857143 HIGH SCHOOL FOR HISTORY & COMMUNICATION 20112012.0 60.8 343.0 93 97 69 84 13.0 3.8 70.0 20.4 41 2 14.0 4.1 95.0 27.7 208.0 60.6 26.0 7.6 183.0 53.4 160.0 46.6 Total Cohort ESSEX STREET ACADEMY 2006 89.000000 64 71.900000 58 65.200000 90.600000 0 0.000000 0.000000 58 65.200000 90.600000 6 6.700000 9.400000 15 16.900000 9 10.100000 02M294 Essex Street Academy Manhattan M445 212-475-4773 212-674-2058 9.000000 12.0 12.0 B39, M103, M14A, M14D, M15, M15-SBS, M21, M22, M9 B, D to Grand St ; F, J, M, Z to Delancey St-E... 350 Grand Street New York NY 10002.00000 www.essexstreetacademy.org 349.00000 Seward Park Educational Campus Consortium School Essex Street Academy prepares all students for... 9th-10th grade courses include: Ecology, Genet... French, Spanish Spanish Language and Culture NaN NaN After-school and Saturday Program: Homework he... Basketball, Bowling, Tennis, Volleyball Basketball, Bowling, Tennis, Volleyball NaN Boys Intramural Basketball Team Greenwich Village Youth Council NaN New York University, Parsons School of Design,... The Possibility Project Volunteers of Legal Services Corporation, New ... Ramius Capital Group, LLC; Hughes Hubbard & Re... NaN ESI Summer Bridge Program for incoming ninth g... Essex Street Academy (ESA) is a member of the ... NaN 8:00 AM 2:45 PM This school will provide students with disabil... ESL Not Functionally Accessible 1.000000 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN 350 Grand Street\nNew York, NY 10002\n(40.7168... 3.000000 1.000000 18.000000 1.005283e+06 1.004080e+09 Chinatown ... 40.716867 -73.989532 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 48.0 56.0 9.0 150.000000 15.0 28.0 8.7 8.1 7.9 8.3 8.0 7.7 7.9 8.9 7.400000 6.500000 7.300000 7.600000 7.9 7.3 7.7 8.2 02
12 02M296 HIGH SCHOOL OF HOSPITALITY MANAGEMENT 58 374.0 375.0 362.0 1111.0 High School of Hospitality Management 0.0 0.0 0.0 2.0 100.000000 4.428571 22.964286 16.214286 27.571429 HIGH SCHOOL OF HOSPITALITY MANAGEMENT 20112012.0 72.4 419.0 154 101 79 85 40.0 9.5 62.0 14.8 30 17 22.0 5.3 111.0 26.5 277.0 66.1 9.0 2.1 135.0 32.2 284.0 67.8 Total Cohort HIGH SCHOOL OF HOSPITALITY MANAGEMENT 2006 75.000000 58 77.300000 50 66.700000 86.200000 1 1.300000 1.700000 49 65.300000 84.500000 8 10.700000 13.800000 11 14.700000 5 6.700000 02M296 High School of Hospitality Management Manhattan M535 212-586-1819 212-586-2713 9.000000 12.0 12.0 M104, M11, M31, M34A-SBS, M42, M50, M57 C, E to 50th St 525 West 50Th Street New York NY 10019.00000 http://schools.nyc.gov/schoolportals/02/M296 431.00000 Park West Educational Campus NaN The High School of Hospitality Management (HSH... College Preparatory Classes Scheduled in Block... Italian English Literature and Composition, United Sta... NaN NaN Book, Chess, Culinary Arts, Digital Learning, ... Basketball & JV Basketball, Soccer, Tennis, Vo... Basketball & JV Basketball, Soccer, Tennis, Vo... Bowling, Handball Intramural Basketball, Flag Football ASPIRA, Jewish Board Youth Council League St. Luke’s-Roosevelt Hospital Center Kingsborough Community College; Hunter College... Museum of Arts and Design Peer Health Exchange NaN NaN National Academy Foundation; American Place Th... Dress Code Required: navy blue or burgundy col... NaN 9:00 AM 3:45 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN 525 West 50Th Street\nNew York, NY 10019\n(40.... 4.000000 3.000000 135.000000 1.083802e+06 1.010790e+09 Clinton ... 40.765027 -73.992517 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 61.0 96.0 56.0 233.000000 27.0 204.0 8.0 7.3 7.1 7.5 8.6 8.1 8.7 8.9 7.100000 6.500000 7.000000 7.400000 7.9 7.3 7.6 8.0 02
13 02M298 PACE HIGH SCHOOL 85 423.0 438.0 432.0 1293.0 Pace High School 21.0 21.0 0.0 2.0 74.750000 3.625000 21.312500 18.000000 25.062500 PACE HIGH SCHOOL 20112012.0 56.7 414.0 118 104 99 93 9.0 2.2 57.0 13.8 45 3 42.0 10.1 134.0 32.4 174.0 42.0 16.0 3.9 181.0 43.7 233.0 56.3 Total Cohort PACE HIGH SCHOOL 2006 92.000000 78 84.800000 76 82.600000 97.400000 39 42.400000 50.000000 37 40.200000 47.400000 2 2.200000 2.600000 6 6.500000 4 4.300000 02M298 Pace High School Manhattan M131 212-334-4663 212-334-4919 9.000000 12.0 12.0 B39, M103, M14A, M15, M15-SBS, M22, M5, M9 6, N, Q, R to Canal St ; B, D to Grand St ; F ... 100 Hester Street New York NY 10002.00000 http://schools.nyc.gov/schoolportals/02/M298 421.00000 NaN NaN The PACE High School mission is to create a co... Advisory Program, Advanced Regents classes thr... Spanish Calculus AB, Environmental Science, United Sta... Biology, Chemistry, Chinese Language and Cultu... NaN Academic Tutorials, Advisory Debates, After-sc... Basketball, Outdoor Track, Softball & JV Softb... Basketball, Outdoor Track, Softball & JV Softb... Cross Country Basketball, Bridge Running, Flag Football, Han... Minds Matter, New York Cares, Chinatown Young ... NaN Pace University Lincoln Center for the Arts, Working Playgroun... Big Brothers Big Sisters, Materials for the Ar... Time Warner, Inc. NaN Dr. Sun Yat Sen Middle School (MS 131), Emma L... All incoming 9th grade students participate in... NaN 9:00 AM 3:15 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN 100 Hester Street\nNew York, NY 10002\n(40.716... 3.000000 1.000000 16.000000 1.082489e+06 1.003010e+09 Chinatown ... 40.716412 -73.992676 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 96.0 100.0 46.0 392.000000 29.0 184.0 7.5 7.1 6.9 7.5 6.6 6.3 6.8 7.1 6.600000 6.200000 6.700000 7.500000 6.9 6.6 6.8 7.4 02
14 02M300 URBAN ASSEMBLY SCHOOL OF DESIGN AND CONSTRUCTI... 48 404.0 449.0 416.0 1269.0 Urban Assembly School of Design and Construction, 99.0 117.0 10.0 2.0 62.250000 2.916667 22.641667 18.166667 27.583333 URBAN ASSEMBLY SCHOOL OF DESIGN AND CONSTRUCTION 20112012.0 75.3 431.0 140 106 109 76 47.0 10.9 86.0 20.0 63 12 26.0 6.0 126.0 29.2 255.0 59.2 21.0 4.9 323.0 74.9 108.0 25.1 Total Cohort URBAN ASSEMBLY SCHOOL OF DESIGN AND C 2006 80.000000 49 61.300000 45 56.300000 91.800000 0 0.000000 0.000000 45 56.300000 91.800000 4 5.000000 8.200000 24 30.000000 7 8.800000 02M300 Urban Assembly School of Design and Constructi... Manhattan M535 212-586-0981 212-586-1731 9.000000 12.0 12.0 M104, M11, M31, M34A-SBS, M42, M50, M57 C, E to 50th St 525 West 50Th Street New York NY 10019.00000 www.uasdc.org 427.00000 Park West Educational Campus NaN The Urban Assembly School of Design and Constr... Four years of English, math, science and socia... German, Spanish Calculus AB, English Language and Composition,... NaN NaN After-school Field Trips and Office Tours, Aft... Basketball & JV Basketball, Handball, Volleyball Basketball & JV Basketball, Handball, Volleyball Bowling, Cross Country, Tennis Boys Basketball Girls Softball Club Girls Socc... Youth Counseling League NaN PRATT Institute The Center for Architecture, The Salvadori Cen... The Urban Assembly Turner Construction Co.; NYC Department of Des... BNP Paribas American Council of Engineering Companies (New... Dress Code Required: Boys - button-down dress ... NaN 8:00 AM 3:15 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to New York City residents who attend... Then to New York City residents NaN NaN NaN NaN NaN 525 West 50Th Street\nNew York, NY 10019\n(40.... 4.000000 3.000000 135.000000 1.083802e+06 1.010790e+09 Clinton ... 40.765027 -73.992517 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 92.0 85.0 34.0 349.000000 23.0 124.0 8.6 7.9 7.7 8.1 7.3 6.0 6.9 7.5 6.500000 5.700000 6.100000 7.200000 7.5 6.5 6.9 7.6 02
15 02M303 FACING HISTORY SCHOOL, THE 76 353.0 358.0 340.0 1051.0 Facing History School, The 42.0 44.0 0.0 2.0 62.500000 3.000000 21.850000 18.500000 26.166667 THE FACING HISTORY SCHOOL 20112012.0 75.6 451.0 185 115 93 58 94.0 20.8 98.0 21.7 78 9 3.0 0.7 106.0 23.5 330.0 73.2 11.0 2.4 228.0 50.6 223.0 49.4 Total Cohort FACING HISTORY SCHOOL THE 2006 71.000000 34 47.900000 34 47.900000 100.000000 0 0.000000 0.000000 34 47.900000 100.000000 0 0.000000 0.000000 21 29.600000 14 19.700000 02M303 Facing History School, The Manhattan M535 212-757-2680 212-757-2156 9.000000 12.0 12.0 M104, M11, M31, M34A-SBS, M42, M50, M57 C, E to 50th St 525 West 50Th Street New York NY 10019.00000 www.facinghistoryschool.org 417.00000 Park West Educational Campus Consortium School At FHS, we graduate lifelong learners who are ... Performance Based Assessment School, Internshi... Spanish English Literature and Composition, United Sta... NaN NaN After-School Tutoring, Art Crew, Photography, ... Basketball, Bowling, Handball, Soccer, Softbal... Basketball, Bowling, Handball, Soccer, Softbal... NaN Activities to be formed based on student interest CityKids, Peer Health Exchange Mount Sinai Hospital New York University (NYU) Steinhardt School of... Facing History and Ourselves, Urban Arts Partn... School Reform Initiative, Fund for Public Scho... NaN NaN NaN Dress Code Required: white/black/navy/brow col... Community Service Requirement, Extended Day Pr... 8:35 AM 3:00 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Priority to District 3 students or residents w... Then to Manhattan students or residents who at... Then to New York City residents who attend an ... Then to District 3 students or residents Then to Manhattan students or residents Then to New York City residents NaN 525 West 50Th Street\nNew York, NY 10019\n(40.... 4.000000 3.000000 135.000000 1.083802e+06 1.010790e+09 Clinton ... 40.765027 -73.992517 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 83.0 100.0 53.0 319.000000 36.0 200.0 8.2 7.8 7.6 8.0 7.4 7.4 7.7 7.9 7.000000 6.600000 6.800000 7.400000 7.5 7.3 7.3 7.8 02
16 02M305 URBAN ASSEMBLY ACADEMY OF GOVERNMENT AND LAW, THE 50 375.0 388.0 385.0 1148.0 Urban Assembly Academy of Government and Law, The 25.0 37.0 15.0 2.0 77.375000 3.750000 20.800000 17.625000 24.000000 THE URBAN ASSEMBLY ACADEMY OF GOVERNMENT AND LAW 20112012.0 68.3 312.0 118 67 74 53 19.0 6.1 53.0 17.0 40 5 12.0 3.8 130.0 41.7 158.0 50.6 11.0 3.5 137.0 43.9 175.0 56.1 Total Cohort URBAN ASSEMBLY ACADEMY OF GOVERNMENT 2006 61.000000 46 75.400000 40 65.600000 87.000000 0 0.000000 0.000000 40 65.600000 87.000000 6 9.800000 13.000000 3 4.900000 10 16.400000 02M305 Urban Assembly Academy of Government and Law, The Manhattan M445 212-505-0745 212-674-8021 9.000000 12.0 12.0 B39, M103, M14A, M14D, M15, M15-SBS, M21, M22, M9 B, D to Grand St ; F, J, M, Z to Delancey St-E... 350 Grand Street New York NY 10002.00000 www.uaagl.org 326.00000 Seward Park Educational Campus NaN The Urban Assembly Academy of Government and L... Students take courses such as Law & Ethics, Fo... Spanish English Literature and Composition, Microecono... NaN NaN Student Government, Art, Basketball, Cheerlead... Basketball, Bowling, Tennis, Volleyball Basketball, Bowling, Tennis, Volleyball NaN AGL Intramural Basketball, Soccer, Cheerleading Henry Street Settlement, Peer Health Exchange,... NaN New York University Law School, NYU Wagner Sch... Abrons Art Center (provides drama instruction) The Urban Assembly Jones Day Law Firm NaN NaN Uniform Required: blue polo shirt with school ... Extended Day Program Requirement 8:32 AM 3:45 PM This school will provide students with disabil... ESL Not Functionally Accessible 1.000000 Priority to Districts 1 and 2 students or resi... Then to Manhattan students or residents who at... Then to New York City residents who attend an ... Then to Districts 1 and 2 students or residents Then to Manhattan students or residents Then to New York City residents NaN 350 Grand Street\nNew York, NY 10002\n(40.7168... 3.000000 1.000000 18.000000 1.005283e+06 1.004080e+09 Chinatown ... 40.716867 -73.989532 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 82.0 79.0 4.0 223.000000 15.0 12.0 7.5 7.4 7.3 7.7 5.4 4.3 5.8 6.0 6.000000 6.000000 6.000000 7.200000 6.1 5.6 6.2 6.8 02
17 02M308 LOWER MANHATTAN ARTS ACADEMY 40 403.0 392.0 405.0 1200.0 Lower Manhattan Arts Academy 0.0 0.0 0.0 2.0 80.000000 3.916667 23.250000 20.750000 26.000000 LOWER MANHATTAN ARTS ACADEMY 20112012.0 62.2 336.0 124 90 75 47 13.0 3.9 83.0 24.7 47 3 32.0 9.5 103.0 30.7 187.0 55.7 12.0 3.6 134.0 39.9 202.0 60.1 Total Cohort LOWER MANHATTAN ARTS ACADEMY 2006 68.000000 46 67.600000 32 47.100000 69.600000 1 1.500000 2.200000 31 45.600000 67.400000 14 20.600000 30.400000 15 22.100000 6 8.800000 02M308 Lower Manhattan Arts Academy Manhattan M445 212-505-0143 212-674-8021 9.000000 12.0 12.0 B39, M103, M14A, M14D, M15, M15-SBS, M21, M22, M9 B, D to Grand St ; F, J, M, Z to Delancey St-E... 350 Grand Street New York NY 10002.00000 www.lomanyc.net 362.00000 Seward Park Educational Campus NaN The Lower Manhattan Arts Academy (LoMA) is a s... Twelve-hour-per-week internships for all senio... Spanish Calculus AB, Physics B English Language and Composition, Spanish Lang... NaN LoMA students are required to participate in t... Basketball, Bowling, Handball, Tennis, Volleyball Basketball, Bowling, Handball, Tennis, Volleyball NaN Intramural Sports: Basketball, Soccer, Swimming Educational Alliance and Grand Street Settleme... Harlem Hospital (Internships) New York University and John Jay College (coll... Henry Street Settlement/Abrons Arts Center, Ne... NaN NaN NaN NaN NaN Internship Requirement, Our school requires an... 8:30 AM 3:00 PM This school will provide students with disabil... ESL Not Functionally Accessible 1.000000 Priority to Districts 1 and 2 students or resi... Then to Manhattan students or residents who at... Then to New York City residents who attend an ... Then to Districts 1 and 2 students or residents Then to Manhattan students or residents Then to New York City residents NaN 350 Grand Street\nNew York, NY 10002\n(40.7168... 3.000000 1.000000 18.000000 1.005283e+06 1.004080e+09 Chinatown ... 40.716867 -73.989532 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 78.0 54.0 15.0 240.000000 13.0 45.0 7.8 7.4 7.3 7.4 7.4 7.5 7.7 8.0 6.600000 5.800000 6.700000 7.300000 7.3 6.9 7.2 7.6 02
18 02M313 JAMES BALDWIN SCHOOL, THE: A SCHOOL FOR EXPEDI... 69 408.0 390.0 390.0 1188.0 NaN 0.0 0.0 0.0 2.0 49.571429 2.214286 21.921429 19.785714 24.071429 THE JAMES BALDWIN SCHOOL: A SCHOOL FOR EXPEDITIO 20112012.0 51.0 248.0 15 74 60 99 12.0 4.8 32.0 12.9 17 1 10.0 4.0 85.0 34.3 135.0 54.4 16.0 6.5 130.0 52.4 118.0 47.6 Total Cohort JAMES BALDWIN SCHOOL THE: A SCHOOL F 2006 79.000000 41 51.900000 30 38.000000 73.200000 0 0.000000 0.000000 30 38.000000 73.200000 11 13.900000 26.800000 23 29.100000 12 15.200000 NaN NaN NaN NaN NaN NaN 8.457766 12.0 12.0 NaN NaN NaN NaN NaN 10725.96477 NaN 772.02168 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.821138 NaN NaN NaN NaN NaN NaN NaN NaN 6.782016 22.237057 3701.569482 2.587548e+06 2.515377e+09 NaN 40.743327 -73.924909 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 70.0 100.0 27.0 146.000000 18.0 55.0 8.5 8.0 7.8 7.9 7.5 7.5 7.9 8.3 7.600000 7.400000 7.400000 8.200000 7.9 7.6 7.7 8.1 02
19 02M316 URBAN ASSEMBLY SCHOOL OF BUSINESS FOR YOUNG WO... 42 373.0 370.0 384.0 1127.0 NaN 0.0 0.0 0.0 2.0 92.500000 3.583333 26.316667 23.500000 27.750000 THE URBAN ASSEMBLY SCHOOL OF BUSINESS FOR YOUNG 20112012.0 74.9 378.0 129 118 77 54 10.0 2.6 59.0 15.6 41 2 13.0 3.4 177.0 46.8 183.0 48.4 5.0 1.3 0.0 0.0 378.0 100.0 Total Cohort URBAN ASSEMBLY SCHOOL OF BUSINESS FOR YOUNG WOMEN 2006 72.000000 53 73.600000 44 61.100000 83.000000 8 11.100000 15.100000 36 50.000000 67.900000 9 12.500000 17.000000 13 18.100000 4 5.600000 02M316 Urban Assembly School of Business for Young Wo... Manhattan M282 212-668-0169 212-668-0635 9.000000 12.0 12.0 M15, M15-SBS, M20, M5, M9, S1115 1, R to Rector St ; 1, R to Whitehall St-South... 26 Broadway New York NY 10004.00000 www.uasbyw.org 412.00000 Broadway Educational Campus All-Girls School We provide a high-quality education to young w... College Preparatory Courses, Advisory Program,... Spanish Chemistry, Spanish Language and Culture, Unite... Chemistry NaN Business Trips, Community Service, Dance, Digi... Basketball Basketball NaN Basketball, Golf, Track, Volleyball CItyKids Foundation, Girls Inc., The New York ... NaN New York University - Leonard N. Stern School ... NaN The Urban Assembly The Women's Bond Club, Goldman Sachs, Time Inc... The Federal Reserve Bank, New York State Banki... Junior Achievement Uniform Required: white short/long-sleeved oxf... NaN 8:35 AM 3:30 PM This school will provide students with disabil... ESL Functionally Accessible 1.000000 Open only to female students Priority to Manhattan students or residents wh... Then to New York City residents who attend an ... Then to Manhattan students or residents Then to New York City residents NaN NaN 26 Broadway\nNew York, NY 10004\n(40.705234939... 1.000000 1.000000 9.000000 1.000811e+06 1.000220e+09 Battery Park City-Lower Manhattan ... 40.705235 -74.013315 NaN 2011.0 NaN 71.13253 670.926829 6.97561 11.715854 26.207317 38.103659 27.268293 36.693902 11.536585 13.482927 38.804878 50.176829 86.0 95.0 61.0 294.000000 21.0 206.0 8.4 7.6 7.5 7.9 7.0 6.2 6.5 7.4 6.800000 6.400000 6.700000 7.600000 7.4 6.7 6.9 7.6 02
(421, 175)

Computing correlations

As a basic metric of the relationships between columns, their correlations can be computed. The correlation is the statistical relationship between two variables. It is measured by the correlation coefficient.

In this case, the Pearson correlation coefficient (or Pearson's r) will be used. This coefficient can take values between -1 and 1. A value of 0 means no correlation, the closer to 1 the stronger the positive correlation, and the closer to -1 the stronger the negative correlation.

The .corr() method, by default, calculates the Pearson correlation coefficient for all numeric columns in the dataset, ignoring missing values.

The code below shows the Pearson correlation coefficient for each column against the sat_score column.

In [28]:
with pd.option_context('display.max_columns', 500,'display.max_rows', 500):
  display(full.corr()['sat_score'])
SAT Critical Reading Avg. Score         0.974758
SAT Math Avg. Score                     0.953011
SAT Writing Avg. Score                  0.981016
sat_score                               1.000000
AP Test Takers                          0.563030
Total Exams Taken                       0.551654
Number of Exams with scores 3 4 or 5    0.549680
CSD                                     0.052829
NUMBER OF STUDENTS / SEATS FILLED       0.391292
NUMBER OF SECTIONS                      0.356518
AVERAGE CLASS SIZE                      0.387248
SIZE OF SMALLEST CLASS                  0.276166
SIZE OF LARGEST CLASS                   0.319893
schoolyear                                   NaN
frl_percent                            -0.718304
total_enrollment                        0.376574
ell_num                                -0.130235
ell_percent                            -0.374965
sped_num                                0.055834
sped_percent                           -0.421215
asian_num                               0.473160
asian_per                               0.543154
black_num                               0.037422
black_per                              -0.308924
hispanic_num                            0.049189
hispanic_per                           -0.361786
white_num                               0.456479
white_per                               0.649289
male_num                                0.337768
male_per                               -0.102907
female_num                              0.393982
female_per                              0.102958
Total Cohort                            0.296345
Total Grads - % of cohort               0.524765
Total Regents - % of cohort             0.612126
Total Regents - % of grads              0.451123
Advanced Regents - % of cohort          0.749388
Advanced Regents - % of grads           0.712301
Regents w/o Advanced - % of cohort      0.009631
Regents w/o Advanced - % of grads      -0.317614
Local - % of cohort                    -0.381332
Local - % of grads                     -0.451128
Still Enrolled - % of cohort           -0.413099
Dropped Out - % of cohort              -0.461606
grade_span_min                         -0.034654
grade_span_max                               NaN
expgrade_span_max                            NaN
postcode                               -0.070223
total_students                          0.393046
number_programs                         0.113217
Community Board                        -0.061968
Council District                       -0.084622
Census Tract                            0.046450
BIN                                     0.046313
BBL                                     0.038303
lat                                    -0.120366
lon                                    -0.134625
Year                                         NaN
Number Tested                           0.090799
Mean Scale Score                        0.276499
Level 1 #                              -0.196813
Level 1 %                              -0.223366
Level 2 #                              -0.132738
Level 2 %                              -0.255521
Level 3 #                               0.159075
Level 3 %                               0.156555
Level 4 #                               0.230596
Level 4 %                               0.253697
Level 3+4 #                             0.226437
Level 3+4 %                             0.264714
rr_s                                    0.291085
rr_t                                    0.012383
rr_p                                    0.112088
N_s                                     0.429746
N_t                                     0.300465
N_p                                     0.434850
saf_p_11                                0.111752
com_p_11                               -0.092124
eng_p_11                                0.030599
aca_p_11                                0.031760
saf_t_11                                0.301001
com_t_11                                0.091409
eng_t_11                                0.046621
aca_t_11                                0.135014
saf_s_11                                0.269115
com_s_11                                0.161323
eng_s_11                                0.165075
aca_s_11                                0.282836
saf_tot_11                              0.278558
com_tot_11                              0.085832
eng_tot_11                              0.093731
aca_tot_11                              0.175131
Name: sat_score, dtype: float64

The output gives several insights worth exploring:

  • Surprisingly, total_enrollment correlates positively with SAT scores. It could be thought that a smaller school would focus more on students, and give higher scores.
  • English language learners percentage (ell_percent) is negatively correlated with SAT scores.
  • The correlations between SAT scores and survey responses are all rather small.
  • There is significant racial inequality in SAT scores (asian_per, black_per, hispanic_per and white_per columns).
  • The percentage of male students (male_per) correlates negatively with SAT scores, and the percentage of female students (female_per) correlates positively.

Putting the schools on the map

A good way to set the context for the problem that is being analysed is with charts or maps.

In this case, the location of each school will be plotted on a NYC map.

The maps will be constructed using Folium, a library that takes advantage of the data wrangling versatility of Python and the mapping strengths of Leaflet.

The code below will:

  • Create a base map for NYC.
  • Make a marker cluster, and add it to the base map.
  • Create a marker for each school, and add them to the marker cluster.
  • Save the map as an .html file.
  • Show the map.
In [29]:
import folium
from folium import plugins
# Create a base map, centered in the mean of the coordinates for all schools
schools_map = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
# Create a marker cluster, and add it to the base map
marker_cluster = folium.plugins.MarkerCluster().add_to(schools_map)
for name, row in full.iterrows():
  # Create a marker for each school, and add it to the marker cluster
  folium.Marker([row['lat'], row['lon']], popup='{0}: {1}'.format(row['DBN'], row['school_name'])).add_to(marker_cluster)
schools_map.save('schools.html')
schools_map
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Making a heat map

The previous map is useful, but the school concentration is hard to notice. This can be alleviated with the use of a heat map.

In [30]:
# Create a base map, centered in the mean of the coordinates for all schools
schools_heatmap = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
# Create a heatmap layer, and add it as a child to the base map
schools_heatmap.add_child(plugins.HeatMap([[row['lat'], row['lon']] for name, row in full.iterrows()]))
schools_heatmap.save('heatmap.html')
schools_heatmap
Out[30]:
Make this Notebook Trusted to load map: File -> Trust Notebook

District-level mapping

Heat maps allow for the visualisation of gradients but aren't very useful to show differences across geographical boundaries (like districts).

So there will be necessary to compute metrics by district, in the following way:

  • Group full by district.
  • Compute the mean of each column by district.
  • Reset the index and use the default one.
  • Remove leading zeros.
In [31]:
district_data = full.groupby('school_dist').agg(np.mean).reset_index()
# Remove leading zeros
district_data['school_dist'] = district_data['school_dist'].apply(lambda x: str(int(x)))

In the code below there is a function to plot district-level metrics. In this case, it will be used to plot the average SAT score per district.

To be able to do this, the district.geojson dataset will be used. This dataset contains GeoJSON data of the boundaries of each district.

In [32]:
def show_district_map(col):
  geo_path = 'https://data.cityofnewyork.us/api/geospatial/r8nu-ymqj?method=export&format=GeoJSON'
  # Create a base map
  districts = folium.Map(location=[full['lat'].mean(), full['lon'].mean()], zoom_start=10)
  # Create a choropleth map, and add it the base map
  folium.Choropleth(
      geo_data=geo_path,
      name=col,
      data=district_data,
      columns=['school_dist', col],
      key_on='feature.properties.school_dist',
      fill_color='OrRd',
      fill_opacity=0.7,
      line_opacity=0.2,
      ).add_to(districts)
  # Add a control to select which layers to show on the map
  folium.LayerControl().add_to(districts)
  districts.save('districts.html')
  return districts
show_district_map('sat_score')
Out[32]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Visualising correlations

In the following plots, the relationships between variables that were noticed when computing correlations will be visualised, to try to better understand the former.

SAT score vs enrolment

The relationship between SAT score and enrolment can be investigated using a scatter plot.

In [33]:
sns.set_theme()
_=sns.scatterplot(data=full,x='total_enrollment',y='sat_score')

There is a big cluster on the bottom left, with low total enrolment and low SAT scores.

Also, there seems to be a small positive correlation between SAT scores and total enrolment.

By making a list of schools, ordered by ascending SAT scores, this can be further explored.

In [34]:
#@title
display(full.sort_values(by='sat_score')['SCHOOL NAME'].head(10))
338                        MULTICULTURAL HIGH SCHOOL
301    INTERNATIONAL HIGH SCHOOL AT PROSPECT HEIGHTS
188            INTERNATIONAL SCHOOL FOR LIBERAL ARTS
240                    HIGH SCHOOL OF WORLD CULTURES
122              INTERNATIONAL COMMUNITY HIGH SCHOOL
170              ACADEMY FOR LANGUAGE AND TECHNOLOGY
378           PAN AMERICAN INTERNATIONAL HIGH SCHOOL
185            KINGSBRIDGE INTERNATIONAL HIGH SCHOOL
317                       IT TAKES A VILLAGE ACADEMY
171                  BRONX INTERNATIONAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

After a quick web search, it is found that most of these schools are for students who are learning English. This seems to indicate that it is not total enrolment that is correlated with SAT scores, but rather the fact that students are English language learners.

SAT score vs English language learners percentage

To elucidate whether the previous conjecture is true, a scatter plot of SAT scores vs English language learners percentage (ell_percent) is made.

In [35]:
_=sns.scatterplot(data=full,x='ell_percent',y='sat_score')

There is a cluster with high ell_percent and low SAT scores on the bottom right of the plot.

By plotting the English language learners percentage by district, it can be compared to the previous map of SAT scores by district.

In [36]:
show_district_map('ell_percent')
Out[36]:
Make this Notebook Trusted to load map: File -> Trust Notebook

As seen from both maps, districts with a high percentage of ELL tend to have lower SAT scores.

SAT scores vs survey scores

Intuitively, a strong correlation can be expected between survey scores and SAT scores. To check if this is true, the correlation between these variables can be plotted.

In [37]:
_=sns.barplot(
    x='index',
    y='sat_score',
    data=full.corr()['sat_score'][['rr_s', 'rr_t', 'rr_p', 'N_s', 'N_t', 'N_p', 'saf_tot_11', 'com_tot_11', 'aca_tot_11', 'eng_tot_11']].reset_index()
    )
_=plt.xticks(rotation=90)

Unexpectedly, the variables with the highest correlation are N_s, N_t, and N_p (the number of student, teacher, and parent respondents, respectively). The three of them correlate strongly with total enrolment, so they are likely biased by ell_learners.

The next metric that correlates most is rr_s, the response rate of students. Which makes sense, as a more engaged student corpus is more likely to answer the survey and more likely to do better on tests.

Next, saf_tot_11 correlates most, the safety and respect total score. This also makes sense, as a safer environment makes learning easier for students.

But none of the other metrics correlates substantially with SAT scores. This might indicate that there is some kind of problem with the questions being asked.

Race and SAT scores

Another angle to explore involves the relation of race to SAT scores. There was a significant difference between SAT scores across different races, which can be clearly seen by plotting them.

In [38]:
_=sns.barplot(
    x='index',
    y='sat_score',
    data=full.corr()['sat_score'][['white_per', 'asian_per', 'black_per', 'hispanic_per']].reset_index()
    )
_=plt.xticks(rotation=90)

Apparently, higher percentages of White and Asian students correlate with higher SAT scores, and higher percentages of Black and Hispanic students correlates with lower SAT scores.

It can be hypothesised that, for Hispanic students, this is caused because they are more recent immigrants, so they are English language learners.

A Hispanic percentage by district map can help shed some light on this.

In [39]:
show_district_map('hispanic_per')
Out[39]:
Make this Notebook Trusted to load map: File -> Trust Notebook

There seems to be a correlation between Hispanic students percentage and English language learners, but the topic needs to be analysed further.

Gender vs SAT scores

Finally, the correlation between gender and SAT scores will be explored. It was noticed that a higher percentage of female students correlates positively with SAT scores. This can be visualised with a bar plot.

In [40]:
_=sns.barplot(
    x='index',
    y='sat_score',
    data=full.corr()['sat_score'][['male_per', 'female_per']].reset_index()
    )
_=plt.xticks(rotation=90)

To further analyse this correlation, a scatter plot can be made.

In [41]:
_=sns.scatterplot(data=full,x='female_per', y='sat_score')

In the extreme left and right sides of the plot are located the gender-segregated schools (male and female, respectively).

Also, there is a cluster with a high percentage of females (greater than 65 %) and very high SAT scores. To try to understand the nature of this cluster, a list of these schools can be printed.

In [42]:
display(full[(full['female_per'] > 65) & (full['sat_score'] > 1400)].sort_values(by='SCHOOL NAME')['SCHOOL NAME'])
8                         BARD HIGH SCHOOL EARLY COLLEGE
33                         ELEANOR ROOSEVELT HIGH SCHOOL
84     FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...
445         FRANK SINATRA SCHOOL OF THE ARTS HIGH SCHOOL
28              PROFESSIONAL PERFORMING ARTS HIGH SCHOOL
51                          TALENT UNLIMITED HIGH SCHOOL
397                          TOWNSEND HARRIS HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

A web search reveals that these are exclusive schools, centred on the performing arts. These schools tend to have a higher percentage of females, and higher SAT scores.

AP scores

Now that the demographic angle has been analysed, a final relationship will be explored, the one between the proportion of students taking the AP exam and SAT scores. It can be expected that the correlation between these metrics is high, as students who take advanced courses tend to perform better on tests.

In [43]:
# Compute proportion of AP test-takers
full['ap_avg'] = full['AP Test Takers '] / full['total_enrollment']
_=sns.scatterplot(data=full,x='ap_avg', y='sat_score')

The plot shows that there seems to be a strong positive correlation between the two variables.

In the top right of the plot, there is a cluster with very high SAT scores and AP exam takers proportion. A list of these schools can be printed to examine them.

In [44]:
display(full[(full['ap_avg'] > .3) & (full['sat_score'] > 1700)].sort_values(by='SCHOOL NAME')['SCHOOL NAME'])
199                         BRONX HIGH SCHOOL OF SCIENCE
250                       BROOKLYN TECHNICAL HIGH SCHOOL
33                         ELEANOR ROOSEVELT HIGH SCHOOL
207    HIGH SCHOOL OF AMERICAN STUDIES AT LEHMAN COLLEGE
428    QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...
460                  STATEN ISLAND TECHNICAL HIGH SCHOOL
48                                STUYVESANT HIGH SCHOOL
397                          TOWNSEND HARRIS HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

These seem to be highly selective schools, which require a test to get in, so it makes sense that they have a high proportion of AP test-takers.

Bringing the story to an end

In this notebook, a basic data science analysis was performed, bringing together different datasets, and going through the whole process workflow.

Several computations and visualisations were made to try to better understand the data. As a highlight, maps were used to help set up the context of the analysis.

This was a first approach to data science, done following a tutorial, and it should serve as a basis to perform a more custom-made project.